andreyshabalin / MatrixEQTL

Matrix eQTL: Ultra fast eQTL analysis via large matrix operations
53 stars 16 forks source link

`findInterval` error in certain workflows #3

Closed andrewejaffe closed 6 years ago

andrewejaffe commented 6 years ago

We've only seen this error when running MatrixEQTL on the exon-exon splice junction matrix. All of the junctions are sorted


2018-01-25 14:16:09 running MatrixEQTL
Matching data files and location files
4360453of4360453 genes matched
58109566of58109566 SNPs matched

Task finished in 36.62 seconds
Reordering SNPs

Task finished in 286.049 seconds
Reordering genes

Task finished in 33.687 seconds
Processing covariates
Task finished in 0.00199999999995271 seconds
Processing gene expression data (imputation, residualization)
Task finished in 7.25300000000004 seconds
Creating output file(s)
Error in findInterval(sn.l, ge.r + cisDist + 1) : 
  'vec' must be sorted non-decreasingly and not contain NAs
Calls: Matrix_eQTL_main -> findInterval
Execution halted

We've seen this a few times for different datasets, and usually removing the more lowly expressed junctions will allow the code to run but we'd like to better troubleshoot this issue.

andreyshabalin commented 6 years ago

I'd like to investigate. Can you share the gene/SNP location files and gene expression and genotype data sets (with data zeroed out, to avoid breaking any data sharing rules)?

andrewejaffe commented 6 years ago

Here is an example datasets of chr21 that fails with the same error message

> message(paste(Sys.time(), 'running MatrixEQTL'))
2018-01-26 14:03:59 running MatrixEQTL
> me <- Matrix_eQTL_main(snps = meth, gene = exprinfo,
+     output_file_name.cis = paste0('.', cpg, '_', opt$feature,
+         '.txt'), # invis file, temporary
+     pvOutputThreshold = 0, pvOutputThreshold.cis = 5e-4,
+ useModel = modelLINEAR,
+ snpspos = methpos, genepos = exprpos, cisDist = 1000)
Matching data files and location files
43436of43436 genes matched
271233of271233 SNPs matched

Task finished in 0.355999999999767 seconds
Reordering genes

Task finished in 9.52599999999984 seconds
Processing covariates
Task finished in 0.00300000000015643 seconds
Processing gene expression data (imputation, residualization)
Task finished in 0.219000000000051 seconds
Creating output file(s)
Error in findInterval(sn.l, ge.r + cisDist + 1) :
  'vec' must be sorted non-decreasingly and not contain NAs
In addition: Warning message:
In .Internal(gc(verbose, reset)) :
  closing unused connection 3 (.CpG_jx.txt)
> save(meth, exprinfo, methpos, exprpos, cpg, opt, file =
'rda/debug_cpg_jx_chr21.Rdata')

> options(width = 120)
> session_info()
Session info
----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 3.4.3 Patched (2018-01-20 r74142)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 tz       America/New_York
 date     2018-01-26

Packages
--------------------------------------------------------------------------------------------------------------
 package              * version   date       source
 acepack                1.4.1     2016-10-29 CRAN (R 3.4.1)
 AnnotationDbi          1.40.0    2017-11-29 Bioconductor
 assertthat             0.2.0     2017-04-11 CRAN (R 3.4.1)
 backports              1.1.2     2017-12-13 CRAN (R 3.4.2)
 base                 * 3.4.3     2018-01-20 local
 base64enc              0.1-3     2015-07-28 CRAN (R 3.4.1)
 bindr                  0.1       2016-11-13 CRAN (R 3.4.1)
 bindrcpp               0.2       2017-06-17 CRAN (R 3.4.1)
 Biobase              * 2.38.0    2017-11-07 Bioconductor
 BiocGenerics         * 0.24.0    2017-11-29 Bioconductor
 BiocParallel           1.12.0    2017-11-29 Bioconductor
 biomaRt                2.34.1    2018-01-09 Bioconductor
 Biostrings           * 2.46.0    2017-11-29 Bioconductor
 bit                    1.1-12    2014-04-09 CRAN (R 3.4.1)
 bit64                  0.9-7     2017-05-08 CRAN (R 3.4.1)
 bitops                 1.0-6     2013-08-17 CRAN (R 3.4.1)
 blob                   1.1.0     2017-06-17 CRAN (R 3.4.1)
 BSgenome               1.46.0    2017-11-29 Bioconductor
 bsseq                * 1.14.0    2017-11-07 Bioconductor
 bumphunter             1.20.0    2017-11-29 Bioconductor
 checkmate              1.8.5     2017-10-24 CRAN (R 3.4.2)
 cluster                2.0.6     2017-03-10 CRAN (R 3.4.3)
 codetools              0.2-15    2016-10-05 CRAN (R 3.4.3)
 colorout             * 1.1-2     2017-08-10 Github
(jalvesaq/colorout@020a14d)
 colorspace             1.3-2     2016-12-14 CRAN (R 3.4.1)
 compiler               3.4.3     2018-01-20 local
 data.table             1.10.4-3  2017-10-27 CRAN (R 3.4.2)
 datasets             * 3.4.3     2018-01-20 local
 DBI                    0.7       2017-06-18 CRAN (R 3.4.1)
 DelayedArray         * 0.4.1     2017-11-07 Bioconductor
 derfinder              1.12.0    2017-11-29 Bioconductor
 derfinderHelper        1.12.0    2017-11-29 Bioconductor
 devtools             * 1.13.4    2017-11-09 CRAN (R 3.4.2)
 digest                 0.6.14    2018-01-14 CRAN (R 3.4.2)
 doRNG                  1.6.6     2017-04-10 CRAN (R 3.4.1)
 downloader             0.4       2015-07-09 CRAN (R 3.4.1)
 dplyr                  0.7.4     2017-09-28 CRAN (R 3.4.1)
 foreach                1.4.4     2017-12-12 CRAN (R 3.4.2)
 foreign                0.8-69    2017-06-22 CRAN (R 3.4.3)
 Formula                1.2-2     2017-07-10 CRAN (R 3.4.1)
 GenomeInfoDb         * 1.14.0    2017-11-29 Bioconductor
 GenomeInfoDbData       1.0.0     2018-01-09 Bioconductor
 GenomicAlignments      1.14.1    2017-11-29 Bioconductor
 GenomicFeatures        1.30.0    2017-11-29 Bioconductor
 GenomicFiles           1.14.0    2017-11-29 Bioconductor
 GenomicRanges        * 1.30.1    2018-01-09 Bioconductor
 GEOquery               2.46.13   2018-01-09 Bioconductor
 getopt               * 1.20.1    2017-11-29 CRAN (R 3.4.2)
 ggplot2                2.2.1     2016-12-30 CRAN (R 3.4.1)
 glue                   1.2.0     2017-10-29 CRAN (R 3.4.2)
 graphics             * 3.4.3     2018-01-20 local
 grDevices            * 3.4.3     2018-01-20 local
 grid                   3.4.3     2018-01-20 local
 gridExtra              2.3       2017-09-09 CRAN (R 3.4.1)
 gtable                 0.2.0     2016-02-26 CRAN (R 3.4.1)
 gtools                 3.5.0     2015-05-29 CRAN (R 3.4.1)
 Hmisc                  4.1-1     2018-01-03 CRAN (R 3.4.2)
 hms                    0.4.0     2017-11-23 CRAN (R 3.4.2)
 htmlTable              1.11.1    2017-12-27 CRAN (R 3.4.2)
 htmltools              0.3.6     2017-04-28 CRAN (R 3.4.1)
 htmlwidgets            0.9       2017-07-10 CRAN (R 3.4.1)
 httr                   1.3.1     2017-08-20 CRAN (R 3.4.1)
 igraph                 1.1.2     2017-07-21 CRAN (R 3.4.1)
 IRanges              * 2.12.0    2017-11-29 Bioconductor
 iterators              1.0.9     2017-12-12 CRAN (R 3.4.2)
 jaffelab             * 0.99.15   2017-10-27 Github
(LieberInstitute/jaffelab@94307b0)
 jsonlite               1.5       2017-06-01 CRAN (R 3.4.1)
 knitr                  1.18      2017-12-27 CRAN (R 3.4.2)
 lattice                0.20-35   2017-03-25 CRAN (R 3.4.3)
 latticeExtra           0.6-28    2016-02-09 CRAN (R 3.4.1)
 lazyeval               0.2.1     2017-10-29 CRAN (R 3.4.2)
 limma                  3.34.5    2018-01-16 Bioconductor
 locfit                 1.5-9.1   2013-04-20 CRAN (R 3.4.1)
 magrittr               1.5       2014-11-22 CRAN (R 3.4.1)
 Matrix                 1.2-12    2017-11-30 CRAN (R 3.4.3)
 MatrixEQTL           * 2.2       2018-01-13 CRAN (R 3.4.2)
 matrixStats          * 0.52.2    2017-04-14 CRAN (R 3.4.1)
 memoise                1.1.0     2017-04-21 CRAN (R 3.4.1)
 methods              * 3.4.3     2018-01-20 local
 munsell                0.4.3     2016-02-13 CRAN (R 3.4.1)
 nnet                   7.3-12    2016-02-02 CRAN (R 3.4.3)
 parallel             * 3.4.3     2018-01-20 local
 permute                0.9-4     2016-09-09 CRAN (R 3.4.1)
 pillar                 1.1.0     2018-01-14 CRAN (R 3.4.2)
 pkgconfig              2.0.1     2017-03-21 CRAN (R 3.4.1)
 pkgmaker               0.22      2014-05-14 CRAN (R 3.4.1)
 plyr                   1.8.4     2016-06-08 CRAN (R 3.4.1)
 prettyunits            1.0.2     2015-07-13 CRAN (R 3.4.1)
 progress               1.1.2     2016-12-14 CRAN (R 3.4.1)
 purrr                  0.2.4     2017-10-18 CRAN (R 3.4.2)
 qvalue                 2.10.0    2017-11-29 Bioconductor
 R.methodsS3            1.7.1     2016-02-16 CRAN (R 3.4.1)
 R.oo                   1.21.0    2016-11-01 CRAN (R 3.4.1)
 R.utils                2.6.0     2017-11-05 CRAN (R 3.4.2)
 R6                     2.2.2     2017-06-17 CRAN (R 3.4.1)
 rafalib              * 1.0.0     2015-08-09 CRAN (R 3.4.1)
 RColorBrewer           1.1-2     2014-12-07 CRAN (R 3.4.1)
 Rcpp                   0.12.14   2017-11-23 CRAN (R 3.4.2)
 RCurl                  1.95-4.10 2018-01-04 CRAN (R 3.4.2)
 readr                  1.1.1     2017-05-16 CRAN (R 3.4.1)
 recount              * 1.4.3     2018-01-09 Bioconductor
 registry               0.5       2017-12-03 CRAN (R 3.4.2)
 rentrez                1.1.0     2017-06-01 CRAN (R 3.4.1)
 reshape2               1.4.3     2017-12-11 CRAN (R 3.4.2)
 rlang                  0.1.6     2017-12-21 CRAN (R 3.4.2)
 RMySQL                 0.10.13   2017-08-14 CRAN (R 3.4.1)
 rngtools               1.2.4     2014-03-06 CRAN (R 3.4.1)
 rpart                  4.1-12    2018-01-12 CRAN (R 3.4.3)
 Rsamtools            * 1.30.0    2017-11-29 Bioconductor
 RSQLite                2.0       2017-06-19 CRAN (R 3.4.1)
 rstudioapi             0.7       2017-09-07 CRAN (R 3.4.1)
 rtracklayer            1.38.2    2018-01-09 Bioconductor
 RUnit                  0.4.31    2015-11-06 CRAN (R 3.4.1)
 S4Vectors            * 0.16.0    2017-11-29 Bioconductor
 scales                 0.5.0     2017-08-24 CRAN (R 3.4.1)
 segmented              0.5-3.0   2017-11-30 CRAN (R 3.4.2)
 SGSeq                * 1.12.0    2018-01-16 Bioconductor
 splines                3.4.3     2018-01-20 local
 stats                * 3.4.3     2018-01-20 local
 stats4               * 3.4.3     2018-01-20 local
 stringi                1.1.6     2017-11-17 CRAN (R 3.4.2)
 stringr                1.2.0     2017-02-18 CRAN (R 3.4.1)
 SummarizedExperiment * 1.8.1     2018-01-09 Bioconductor
 survival               2.41-3    2017-04-04 CRAN (R 3.4.3)
 tibble                 1.4.1     2017-12-25 CRAN (R 3.4.2)
 tidyr                  0.7.2     2017-10-16 CRAN (R 3.4.2)
 tools                  3.4.3     2018-01-20 local
 utils                * 3.4.3     2018-01-20 local
 VariantAnnotation      1.24.5    2018-01-16 Bioconductor
 withr                  2.1.1     2017-12-19 CRAN (R 3.4.2)
 XML                    3.98-1.9  2017-06-19 CRAN (R 3.4.1)
 xml2                   1.1.1     2017-01-24 CRAN (R 3.4.1)
 xtable                 1.8-2     2016-02-05 CRAN (R 3.4.1)
 XVector              * 0.18.0    2017-11-29 Bioconductor
 zlibbioc               1.24.0    2017-11-07 Bioconductor

On Thu, Jan 25, 2018 at 12:03 PM, Andrey Shabalin notifications@github.com wrote:

I'd like to investigate. Can you share the gene/SNP location files and gene expression and genotype data sets (with data zeroed out, to avoid breaking any data sharing rules)?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/andreyshabalin/MatrixEQTL/issues/3#issuecomment-360531543, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmMaAABuPQ4fPJhLKcNnGSrjlLFtROCks5tOLPMgaJpZM4RtC1C .

andreyshabalin commented 6 years ago

I see the error message, not the data. Can you send me the data? Maybe send it directly andrey.shabalin@gmail.com.

andreyshabalin commented 6 years ago

I believe I've fixed the problem (commit). Please try it now. Thank you for your help.

lcolladotor commented 6 years ago

Hi Andrey,

I can report that your latest commit indeed fixed the problem.

Best, Leonardo

(PS I work with Andrew)