chr1swallace / coloc

Repo for the R package coloc
144 stars 44 forks source link

Update sensitivity.R #80

Open fromOmelas opened 2 years ago

fromOmelas commented 2 years ago

In the sensitivity function it was written min(p1,p1). I substituted it with min(p1,p2). Additionally, with small numbers x is not always equivalent to 10^log10(x), because of this my analysis stopped when testp12 was the argument for prior.snp2hyp(). I think the best solution is changing the 'if' in this function so: (any(p12 < 10^log10(p1*p2)) || any(p12 > 10^log10(p1)) || any(p12 > 10^log10(p2)))

fromOmelas commented 2 years ago

I'll try to explain! It's just a quick fix so maybe you'll have a better idea. The problem is in testp12 <- 10^seq(log10(p1*p2),log10(min(p1,p2)),length.out=npoints) the first and last elements of testp12 should be mathematically equivalent to p1*p2 and min(p1, p2) but for R this is not always true. So sometimes it happens that any(p12<p1*p2) || any(p12 > p1) || any(p12 > p2) returns T even with testp12 (that I think shouldn't happen) since p12 is not smaller than p1*p2 or bigger than p1 or p2, because for R sometimes x differs from 10^log10(x)

mengyuankan commented 1 year ago

I just installed coloc, but looks like the typo "min(p1,p1)" is still not fixed? testp12 <- 10^seq(log10(p1*p2),log10(min(p1,p1)),length.out=npoints) (Line 133 in sensitivity function: https://github.com/chr1swallace/coloc/blob/main/R/sensitivity.R)

yatest commented 6 months ago

I know this is a relatively old pull request, but I just wanted to try and explain why the error occurs for certain p1 and p2. As fromOmelas said, prior.snp2hyp can return NULL when the argument for the if statement evaluates as true. When prior.adjust then calls pr0 <- matrix(f(p12),nrow=nrow(pr1),ncol=ncol(pr1),byrow=TRUE), nrow(pr1) returns an error as NULL has no extent.

An example of this is if we define p1 = p2 = 1/(10*37846). Evaluating testp12 <- 10^seq(log10(p1*p2),log10(min(p1,p1)),length.out=100), we can then compare testp12[1] and p1*p2:

> formatC(testp12[1], format = "e", digits = 20)
[1] "6.98168145581819235986e-12"
> formatC(p1*p2, format = "e", digits = 20)
[1] "6.98168145581820609236e-12"
> testp12[1] == p1*p2
[1] FALSE

which shows that it is a floating-point precision error causing the problem.

The same can be true for very large numbers (not particularly applicable here, but informative nonetheless):

> formatC(1e16/(3378460), format = "e", digits = 20)
[1] "2.95992848812772703171e+09"
> formatC(10^log10(1e16/(3378460)), format = "e", digits = 20)
[1] "2.95992848812772989273e+09"
> 1e16/(3378460) == 10^log10(1e16/(3378460))
[1] FALSE

Using a value that is better represented by a floating-point number avoids this issue:

> formatC(2e-9, format = "e", digits = 20)
[1] "2.00000000000000012456e-09"
> formatC(10^log10(2e-9), format = "e", digits = 20)
[1] "2.00000000000000012456e-09"
> 2e-9 == 10^log10(2e-9)
[1] TRUE

To avoid using logs in the argument for the if statement, it may be possible to use all.equal instead to make the comparisons more lenient.