scores and p-values - Githubissues

GreenleafLab / motifmatchr

Fast motif matching in R

https://greenleaflab.github.io/motifmatchr/

GNU General Public License v3.0

41 stars 11 forks source link

scores and p-values #3

Open mistrm82 opened 6 years ago

mistrm82 commented 6 years ago

Apologies, wasn't sure if this was the best place to post, but:

Is there a way of extracting a column of p-values along with the positions?
Is there documentation on what the score represents and what is an acceptable threshold?

AliciaSchep commented 6 years ago

Hi @mistrm82, this is a fine place to post.

Re 1 -- no, there is only the option to extract scores with positions:

motif_pos <- matchMotifs(example_motifs, peaks, genome = "hg19", 
                          out = "positions")

Re 2 -- This is a port of the MOODS C++ package (https://github.com/jhkorhonen/MOODS) so the documentation and/or papers for that package might be useful (e.g. https://ieeexplore.ieee.org/document/4803829/?reload=true, https://academic.oup.com/bioinformatics/article/25/23/3181/215705, https://www.cs.helsinki.fi/group/pssmfind/)

In terms of the p-value vs. score, the package finds the score threshold that would correspond to a certain p-value (in terms of the probability of a random sequence having a score that high). It does not then find the p-value for each potential motif site.

mistrm82 commented 6 years ago

Thanks @AliciaSchep . So what you are saying is that p-values are not derived for each individual site? In that case, I wouldn't need the p-values since I was assuming each as an independent test and planning on performing multiple test correction.

I'll take at those links to get a better feel for the score values.

snystrom commented 5 years ago

For anyone else looking for this information take a look at the following links:

https://github.com/jhkorhonen/MOODS/issues/12#issuecomment-405912018

https://github.com/jhkorhonen/MOODS/wiki/Brief-theoretical-introduction

It would be very helpful to include direct links to some of these pages in the documentation, or a simple description in the package help pages themselves, as it is kind of difficult to find a clear explanation. Reading the papers isn't sufficient because I couldn't figure out which number was being reported as the score by the software until digging through these github issues.