Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Error: vector memory exhausted (limit reached?) #146

Closed andgan closed 1 year ago

andgan commented 1 year ago

1. Bug description

I'm using the latest version of MungeSumstats (1.6.0) and I start to see Error: vector memory exhausted (limit reached?) for many summary statistics. I didn't have this problem with an older version of MungeSumstats based on R 4.1.1. The error typically after Loading SNPlocs data message is printed.

Console output

Checking for empty columns.
Standardising column headers.
First line of summary statistics file: 
MarkerName  CHR POS A1  A2  EAF_A1  Beta    SE  Pval    
Summary statistics report:
   - 11,514,381 rows
   - 11,417,256 unique variants
   - 1,686 genome-wide significant variants (P<5e-8)
   - 22 chromosomes
Checking for multi-GWAS.
Checking for multiple RSIDs on one row.
Inferring genome build.
Loading SNPlocs data.
Loading reference genome data.
Preprocessing RSIDs.
Validating RSIDs of 10,000 SNPs using BSgenome::snpsById...
BSgenome::snpsById done in 56 seconds.
Loading SNPlocs data.
Loading reference genome data.
Preprocessing RSIDs.
Validating RSIDs of 10,000 SNPs using BSgenome::snpsById...
BSgenome::snpsById done in 64 seconds.
Inferred genome build: GRCH37
Checking SNP RSIDs.
97,126 SNP IDs are not correctly formatted. These will be corrected from the reference genome.
Loading SNPlocs data.
Error: vector memory exhausted (limit reached?)

Expected behaviour

(A clear and concise description of what you expected to happen.)

3. Session info

(Add output of the R function utils::sessionInfo() below. This helps us assess version/OS conflicts which could be causing bugs.)

``` R version 4.2.2 (2022-10-31) Platform: aarch64-apple-darwin21.6.0 (64-bit) Running under: macOS Monterey 12.6.3 Matrix products: default LAPACK: /opt/homebrew/Cellar/r/4.2.2_1/lib/R/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.2 [4] BiocGenerics_0.44.0 MungeSumstats_1.6.0 loaded via a namespace (and not attached): [1] MatrixGenerics_1.10.0 [2] Biobase_2.58.0 [3] httr_1.4.5 [4] BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1 [5] bit64_4.0.5 [6] jsonlite_1.8.4 [7] R.utils_2.12.2 [8] assertthat_0.2.1 [9] BiocFileCache_2.6.1 [10] blob_1.2.3 [11] BSgenome_1.66.3 [12] GenomeInfoDbData_1.2.9 [13] Rsamtools_2.14.0 [14] yaml_2.3.7 [15] progress_1.2.2 [16] pillar_1.8.1 [17] RSQLite_2.3.0 [18] lattice_0.20-45 [19] glue_1.6.2 [20] digest_0.6.31 [21] GenomicRanges_1.50.2 [22] XVector_0.38.0 [23] googleAuthR_2.0.0 [24] Matrix_1.5-3 [25] R.oo_1.25.0 [26] XML_3.99-0.13 [27] pkgconfig_2.0.3 [28] biomaRt_2.54.0 [29] BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000 [30] zlibbioc_1.44.0 [31] BiocParallel_1.32.5 [32] tibble_3.1.8 [33] KEGGREST_1.38.0 [34] generics_0.1.3 [35] ellipsis_0.3.2 [36] cachem_1.0.7 [37] SummarizedExperiment_1.28.0 [38] GenomicFeatures_1.50.4 [39] cli_3.6.0 [40] magrittr_2.0.3 [41] crayon_1.5.2 [42] memoise_2.0.1 [43] R.methodsS3_1.8.2 [44] fs_1.6.1 [45] fansi_1.0.4 [46] xml2_1.3.3 [47] tools_4.2.2 [48] data.table_1.14.8 [49] prettyunits_1.1.1 [50] hms_1.1.2 [51] BiocIO_1.8.0 [52] gargle_1.3.0 [53] lifecycle_1.0.3 [54] matrixStats_0.63.0 [55] stringr_1.5.0 [56] DelayedArray_0.24.0 [57] AnnotationDbi_1.60.0 [58] Biostrings_2.66.0 [59] compiler_4.2.2 [60] rlang_1.0.6 [61] grid_4.2.2 [62] RCurl_1.98-1.10 [63] rstudioapi_0.14 [64] VariantAnnotation_1.44.1 [65] rjson_0.2.21 [66] rappdirs_0.3.3 [67] bitops_1.0-7 [68] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.23 [69] SNPlocs.Hsapiens.dbSNP155.GRCh38_0.99.23 [70] restfulr_0.0.15 [71] codetools_0.2-19 [72] DBI_1.1.3 [73] curl_5.0.0 [74] R6_2.5.1 [75] GenomicAlignments_1.34.0 [76] dplyr_1.1.0 [77] rtracklayer_1.58.0 [78] fastmap_1.1.1 [79] bit_4.0.5 [80] utf8_1.2.3 [81] filelock_1.0.2 [82] stringi_1.7.12 [83] parallel_4.2.2 [84] vctrs_0.5.2 [85] png_0.1-8 [86] dbplyr_2.3.1 [87] tidyselect_1.2.0 ```
Al-Murphy commented 1 year ago

Hey! So the issue here is most likely due to the change in the default dbSNP version - MSS 1.6.0 uses dbSNP 155 (the latest version) which contains far more SNPs than 144, the old version used. This results in a larger memory requirement to use it. You could avoid this but switching back to version 144 using the dbSNP = 144 parameter. However, this is not recommended and I would instead advise running the code on a larger RAM machine - what are you currently running it on? Also make sure you have nothing else using RAM in the background on the machine.

Hope this helps, Alan.

andgan commented 1 year ago

Thanks! I actually now have increased memory in Rstudio using this suggestion: https://stackoverflow.com/questions/51295402/r-on-macos-error-vector-memory-exhausted-limit-reached and it work!