Closed lcolladotor closed 6 years ago
Leonardo,
Having any reproducible example which I could use to investigate would be great.
Andrey
Hi Andrey,
Here's a quick update. I tried reproducing the above error using the data for:
and it worked without problems in all these scenarios. I'm re-running the full jobs using all the data (they've been running for the past 36 hours) to see if I can reproduce the error in that case. However, I now have a suspicion to what might have been the real issue. While I was running the jobs that lead to the reported error, I updated my MatrixEQTL
installation (to check this other issue https://github.com/andreyshabalin/MatrixEQTL/issues/3). So my guess right now is that re-installing MatrixEQTL
lead to this situation.
In any case, out of the current 4 jobs that I'm running one of them reports that it's 60% done, so hopefully we'll know the answer in 2 days or less.
Best, Leonardo
Hi again,
Sadly, the job ended with the same result despite using the latest version from GitHub. This time I did include a call to warnings()
after running Matrix_eQTL_main()
.
99.50% done, 9,455,047 cis-eQTLs
100.00% done, 9,455,081 cis-eQTLs
Task finished in 262481.375 seconds
There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In (sn.l[x]:(sn.r[x] - 1)) * nrow(statistic) :
NAs produced by integer overflow
2: In (sn.l[x]:(sn.r[x] - 1)) * nrow(statistic) + x :
NAs produced by integer overflow
## Plus more of the same
[1] "neqtls and ntests"
[1] 9455081
[1] 14851692
plus another check I added:
[1] "Number of meQTLs without row ids"
FALSE TRUE
9431253 23828
from running
print('neqtls and ntests')
me$cis$neqtls
me$cis$ntests
print('Number of meQTLs without row ids')
table(!is.na(me$cis$eqtls$snps))
So it seems like this issue can only be reproduced with the full data and that has to do with integer overflows.
I can report that if I subset all chrs to certain genes (about 26% of the original 60,252) and certain positions of the genome (about 37% of the original 18,664,892), it all works well even despite using a higher pvOutputThreshold.cis
(0.01 instead of 5e-4):
[1] "neqtls and ntests"
[1] 237525
[1] 7842494
[1] "Number of meQTLs without row ids"
TRUE
237525
So maybe there's a limit on the number of neqtls or ntests.
Best, Leonardo
Main part:
MatrixEQTL * 2.2 2018-01-26 Github (andreyshabalin/MatrixEQTL@454aacd)
Full list:
> library(MatrixEQTL)
> library(devtools)
> options(width = 120)
> session_info()
Session info ----------------------------------------------------------------------------------------------------------
setting value
version R version 3.4.3 Patched (2018-01-20 r74142)
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
tz US/Eastern
date 2018-02-02
Packages --------------------------------------------------------------------------------------------------------------
package * version date source
base * 3.4.3 2018-01-20 local
colorout * 1.1-2 2017-08-10 Github (jalvesaq/colorout@020a14d)
compiler 3.4.3 2018-01-20 local
datasets * 3.4.3 2018-01-20 local
devtools * 1.13.4 2017-11-09 CRAN (R 3.4.2)
digest 0.6.14 2018-01-14 CRAN (R 3.4.2)
graphics * 3.4.3 2018-01-20 local
grDevices * 3.4.3 2018-01-20 local
MatrixEQTL * 2.2 2018-01-26 Github (andreyshabalin/MatrixEQTL@454aacd)
memoise 1.1.0 2017-04-21 CRAN (R 3.4.1)
methods * 3.4.3 2018-01-20 local
stats * 3.4.3 2018-01-20 local
tools 3.4.3 2018-01-20 local
utils * 3.4.3 2018-01-20 local
withr 2.1.1 2017-12-19 CRAN (R 3.4.2)
What is the slicesize you are using loading 'gene' and 'snps'?
I've committed a small update fixing this integer overflow. I hope it fixed the whole problem.
Hi Andrey,
The file slice size was 300 for both the 'gene' and the 'snps'. Also, thanks for https://github.com/andreyshabalin/MatrixEQTL/commit/b9a9f0197bce3ee39bbfd6ebb59b47f19f6a4509. I re-submitted my 4 jobs and will report back once at least 1 of them is done.
Best, Leonardo
Oh, 300 is way too small. I'd suggest at least 3000. It should not affect the memory consumption too much.
In any case, I'll be waiting for your updates.
With the narrow local-band (cisDist = 1000) you use, I'd say slice size of 300 is reasonable.
Hi Andrey,
https://github.com/andreyshabalin/MatrixEQTL/commit/b9a9f0197bce3ee39bbfd6ebb59b47f19f6a4509 fixed this issue, thanks!
Best, Leonardo
Thank you again @andreyshabalin!
Hi,
Using
MatrixEQTL
version 2.2 I'm running into a weird result that might be resolved by lowering the p-value threshold (as in https://github.com/andreyshabalin/MatrixEQTL/issues/1). Still, maybe you've seen this before and have some idea on what could be done.The issue is that many of the results in
me$cis$eqtls
(I saved the output ofMatrix_eQTL_main()
to the objectme
) areNA
, including the snp and gene ids. So I can't check if either the snp or the gene id has some weird property.Weird results
Call used:
File checks
The code works as expected with one methylation data set but not in a second one: both use the same gene information and have the same number of samples. The second one has more non-zero entries in the
meth
object and we had to use a lower file slice size (meth$fileSliceSize <- 300
instead of 2000) to keep the memory from blowing up. That's why I'm guessing that lowering the p-value threshold could help. If needed, I can see if this happens with the data from a small chromosome and send you the files via email (like we did for https://github.com/andreyshabalin/MatrixEQTL/issues/3).Best, Leonardo