After working on some of the utility functions in #4 and future not-yet-filed issues for functions not yet figured out, need to write a vignette laying out the motivation and demonstrating with a small set of genes/SNPs.
The motifbreakR is a motif-based approach to TF binding site prediction using position weight matrices
Main goal in adding motifbreakR workflow was to be able to say something about non-coding variants, even if consequence interpretation is still hand-wavy
Analysis is not even beginning to touch tissue-specific regulation - that is a longer goal of Leah's efforts toward GRNs...
Basically, there will be far smarter ways of addressing regulatory DNA
But, for now, we need some stats to support what comes out of motifbreakR
Greg did not mention but he also added a feature to Looker tables to check for loss-of-function mutations in the TFs themselves
Incorporate enrichment analysis and statistical analysis to TFBS motif analysis (motifbreakR) workflow on Form Bio
Address all Sven concerns about TFBS analysis:
Tested regions (all +5kb of annotated genes) vs random ranges
Candidate genes (varying N) vs random genes
Specific transcription factors represented in motifDB
After working on some of the utility functions in #4 and future not-yet-filed issues for functions not yet figured out, need to write a vignette laying out the motivation and demonstrating with a small set of genes/SNPs.
Notes from @tabbzi via SC-568: