Closed paul-shannon closed 2 years ago
@scoetzee
Many months later than predicted, I am now using motifbreakR at scale, preparing a talk for this Friday: Exploring for tissue-specific effects of non-coding variants at cryptic AD GWAS loci
It would be really great if calculatePvalue ran in parallel. Is there any chance this could happen soon? I'd be most grateful.
So I ran some tests, and it should be possible. One caveat is that some p-value calculations can take very very very long, and take a huge amount of memory - due to the dynamic programming method that's used to calculate them. This is why I have been hesitant to implement it - most of the time and memory is spent on one or two snps. However, I believe that it could be made more deterministic if I use round matrix with a fixed granularity that could be set by the user. Would that be useful for your purposes?
Yes, Simon, that would be very useful. All my calculations come with caveats and probabilities, as I assemble lots of sometimes reinforcing tentative evidence.
So adding even imperfect pvalues is a boon.
On Mar 28, 2022, at 3:43 PM, Simon Coetzee @.***> wrote:
So I ran some tests, and it should be possible. One caveat is that some p-value calculations can take very very very long, and take a huge amount of memory - due to the dynamic programming method that's used to calculate them. This is why I have been hesitant to implement it - most of the time and memory is spent on one or two snps. However, I believe that it could be made more deterministic if I use round matrix with a fixed granularity that could be set by the user. Would that be useful for your purposes?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.
It looks like the pvalue stabilizes somewhere around a granularity of 1e-4 for this particular snp for the jaspar database.
To be clear the p-value represents the the p-value for the reference or alternate allele binding. I have added an alleleEffectSize
that is something like the proportion of the change caused by ref vs alt over the total possible pwm score.
The current version on here 2.8.99 has these features
Thanks, Simon. In my so far limited first use, calculatePvalue runs fast, and provides useful information.
I'm grateful.
@scoetzee,
Now that the parallel execution of
MotifbreakR
works so well, I wonder if this could be made available forcalcaultePvalues
also?Any suggestions?