Open mistrm82 opened 6 years ago
Hi @mistrm82, this is a fine place to post.
Re 1 -- no, there is only the option to extract scores with positions:
motif_pos <- matchMotifs(example_motifs, peaks, genome = "hg19",
out = "positions")
Re 2 -- This is a port of the MOODS C++ package (https://github.com/jhkorhonen/MOODS) so the documentation and/or papers for that package might be useful (e.g. https://ieeexplore.ieee.org/document/4803829/?reload=true, https://academic.oup.com/bioinformatics/article/25/23/3181/215705, https://www.cs.helsinki.fi/group/pssmfind/)
In terms of the p-value vs. score, the package finds the score threshold that would correspond to a certain p-value (in terms of the probability of a random sequence having a score that high). It does not then find the p-value for each potential motif site.
Thanks @AliciaSchep . So what you are saying is that p-values are not derived for each individual site? In that case, I wouldn't need the p-values since I was assuming each as an independent test and planning on performing multiple test correction.
I'll take at those links to get a better feel for the score values.
For anyone else looking for this information take a look at the following links:
https://github.com/jhkorhonen/MOODS/issues/12#issuecomment-405912018
https://github.com/jhkorhonen/MOODS/wiki/Brief-theoretical-introduction
It would be very helpful to include direct links to some of these pages in the documentation, or a simple description in the package help pages themselves, as it is kind of difficult to find a clear explanation. Reading the papers isn't sufficient because I couldn't figure out which number was being reported as the score by the software until digging through these github issues.
Apologies, wasn't sure if this was the best place to post, but: