Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
27 stars 20 forks source link

Memory Error for readGeno but not for readVcf #75

Closed grasshoffm closed 1 year ago

grasshoffm commented 1 year ago

I have a curious error.

I am reading a gzipped tabix-indexed VCF file with 7 variants and 23,025 cells.

When I use readGeno, I receive the following error:

Error: scanVcf: (internal) _vcftype_grow 'sz' < 0; cannot allocate memory?

But when I readVcf, I do not receive this error. Even when I read the entire VCF file, I do not get this error.

Both approaches are equivalent and return the same output. I tested this for a different file. But it is surprising that readGeno would run into a memory error,

I wanted to use readGeno to avoid a possible memory issue. Am I missing something?

Here is the code I am using.

This leads to the memory error. depth_loaded <- VariantAnnotation::readGeno(file = path_to_vcf_file, "DP")

This works. depth_to_add <- VariantAnnotation::readVcf(file = path_to_vcf_file, param = ScanVcfParam(geno = "DP")) depth_to_add <- VariantAnnotation::geno(depth_to_add)$DP

This works too and loads the entire file. depth_to_add <- VariantAnnotation::readVcf(file = path_to_vcf_file) depth_to_add <- VariantAnnotation::geno(depth_to_add)$DP

hpages commented 1 year ago

Please provide your sessionInfo() + the VCF file you are using (or a subset of it) so that we can reproduce the issue. Thanks!

grasshoffm commented 1 year ago

Sorry, I forgot the session info. I cannot provide any of the VCF files unfortunately.

R version 4.2.3 (2023-03-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Rocky Linux 8.8 (Green Obsidian)

Matrix products: default BLAS: /cm/shared/apps/R4.2.3/lib64/R/lib/libRblas.so LAPACK: /cm/shared/apps/R4.2.3/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.utf-8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf-8 LC_COLLATE=en_US.utf-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.utf-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] ggpubr_0.6.0 ggplot2_3.4.3
[3] Seurat_5.0.0 SeuratObject_5.0.0
[5] sp_2.1-1 sigurd_0.2.33
[7] SummarizedExperiment_1.28.0 Biobase_2.58.0
[9] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[11] IRanges_2.32.0 S4Vectors_0.36.2
[13] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
[15] matrixStats_1.0.0

loaded via a namespace (and not attached): [1] utf8_1.2.3 spatstat.explore_3.2-3 reticulate_1.32.0
[4] tidyselect_1.2.0 RSQLite_2.3.1 AnnotationDbi_1.60.2
[7] htmlwidgets_1.6.2 grid_4.2.3 BiocParallel_1.32.6
[10] Rtsne_0.16 munsell_0.5.0 codetools_0.2-19
[13] ica_1.0-3 future_1.33.0 miniUI_0.1.1.1
[16] withr_2.5.1 spatstat.random_3.1-6 colorspace_2.1-0
[19] progressr_0.14.0 filelock_1.0.2 ROCR_1.0-11
[22] ggsignif_0.6.4 tensor_1.5 listenv_0.9.0
[25] GenomeInfoDbData_1.2.9 polyclip_1.10-6 bit64_4.0.5
[28] parallelly_1.36.0 vctrs_0.6.3 generics_0.1.3
[31] BiocFileCache_2.6.1 R6_2.5.1 doParallel_1.0.17
[34] clue_0.3-64 bitops_1.0-7 spatstat.utils_3.0-3
[37] cachem_1.0.8 DelayedArray_0.24.0 BiocIO_1.8.0
[40] promises_1.2.1 scales_1.2.1 gtable_0.3.4
[43] globals_0.16.2 goftest_1.2-3 spam_2.10-0
[46] rlang_1.1.1 GlobalOptions_0.1.2 splines_4.2.3
[49] rstatix_0.7.2 rtracklayer_1.58.0 lazyeval_0.2.2
[52] broom_1.0.5 spatstat.geom_3.2-5 yaml_2.3.7
[55] reshape2_1.4.4 abind_1.4-5 backports_1.4.1
[58] GenomicFeatures_1.50.4 httpuv_1.6.12 tools_4.2.3
[61] ellipsis_0.3.2 RColorBrewer_1.1-3 ggridges_0.5.4
[64] Rcpp_1.0.11 plyr_1.8.9 progress_1.2.2
[67] zlibbioc_1.44.0 purrr_1.0.2 RCurl_1.98-1.12
[70] prettyunits_1.2.0 deldir_1.0-9 pbapply_1.7-2
[73] GetoptLong_1.0.5 cowplot_1.1.1 zoo_1.8-12
[76] ggrepel_0.9.4 cluster_2.1.4 magrittr_2.0.3
[79] data.table_1.14.8 RSpectra_0.16-1 scattermore_1.2
[82] circlize_0.4.15 lmtest_0.9-40 RANN_2.6.1
[85] fitdistrplus_1.1-11 hms_1.1.3 patchwork_1.1.3
[88] mime_0.12 archive_1.1.6 xtable_1.8-4
[91] XML_3.99-0.14 fastDummies_1.7.3 gridExtra_2.3
[94] shape_1.4.6 compiler_4.2.3 biomaRt_2.54.1
[97] tibble_3.2.1 KernSmooth_2.23-20 crayon_1.5.2
[100] htmltools_0.5.6 later_1.3.1 tidyr_1.3.0
[103] DBI_1.1.3 dbplyr_2.4.0 ComplexHeatmap_2.14.0
[106] MASS_7.3-60 rappdirs_0.3.3 car_3.1-2
[109] Matrix_1.6-3 cli_3.6.1 parallel_4.2.3
[112] dotCall64_1.0-2 igraph_1.5.1 pkgconfig_2.0.3
[115] GenomicAlignments_1.34.1 plotly_4.10.2 spatstat.sparse_3.0-2
[118] xml2_1.3.5 foreach_1.5.2 XVector_0.38.0
[121] stringr_1.5.0 VariantAnnotation_1.44.1 digest_0.6.33
[124] sctransform_0.4.1 RcppAnnoy_0.0.21 spatstat.data_3.0-1
[127] Biostrings_2.66.0 leiden_0.4.3 uwot_0.1.16
[130] restfulr_0.0.15 curl_5.0.2 shiny_1.7.5.1
[133] Rsamtools_2.14.0 rjson_0.2.21 lifecycle_1.0.3
[136] nlme_3.1-162 jsonlite_1.8.7 carData_3.0-5
[139] BSgenome_1.66.3 viridisLite_0.4.2 fansi_1.0.4
[142] pillar_1.9.0 ggsci_3.0.0 lattice_0.21-8
[145] KEGGREST_1.38.0 fastmap_1.1.1 httr_1.4.7
[148] survival_3.5-5 glue_1.6.2 png_0.1-8
[151] iterators_1.0.14 bit_4.0.5 stringi_1.7.12
[154] blob_1.2.4 RcppHNSW_0.4.1 memoise_2.0.1
[157] dplyr_1.1.3 irlba_2.3.5.1 future.apply_1.11.0

hpages commented 1 year ago

Thanks for the sessionInfo().

I cannot provide any of the VCF files unfortunately.

That means we can't help then, sorry.

hpages commented 1 year ago

I would still recommend that you upgrade to the latest Bioconductor version, which is 3.18. This might help. You're using 3.16 which is old and no longer supported.

Note that BioC 3.18 requires R 4.3.

Good luck.

grasshoffm commented 1 year ago

Okay, Thanks for the advice.