Closed shwong-tw closed 1 year ago
thanks for report please supply sessionInfo() result after error ensure BiocManager::valid() is TRUE
it may take some time to fix
In this environment VariantAnnotation::info() gives error
R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 stats graphics grDevices utils datasets [7] methods base
other attached packages:
[1] VariantAnnotation_1.42.1 Rsamtools_2.12.0
[3] Biostrings_2.64.1 XVector_0.36.0
[5] SummarizedExperiment_1.26.1 Biobase_2.56.0
[7] GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
[9] IRanges_2.30.1 S4Vectors_0.34.0
[11] MatrixGenerics_1.8.1 matrixStats_0.62.0
[13] BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45
[3] prettyunits_1.1.1 png_0.1-7
[5] assertthat_0.2.1 digest_0.6.29
[7] utf8_1.2.2 BiocFileCache_2.4.0
[9] R6_2.5.1 RSQLite_2.2.20
[11] httr_1.4.3 pillar_1.7.0
[13] zlibbioc_1.42.0 rlang_1.0.6
[15] GenomicFeatures_1.48.4 progress_1.2.2
[17] curl_4.3.2 rstudioapi_0.13
[19] blob_1.2.3 Matrix_1.5-1
[21] BiocParallel_1.30.3 stringr_1.4.0
[23] RCurl_1.98-1.8 bit_4.0.4
[25] biomaRt_2.52.0 DelayedArray_0.22.0
[27] rtracklayer_1.56.1 compiler_4.2.0
[29] pkgconfig_2.0.3 tidyselect_1.1.2
[31] KEGGREST_1.36.3 tibble_3.1.7
[33] GenomeInfoDbData_1.2.8 codetools_0.2-18
[35] XML_3.99-0.13 fansi_1.0.3
[37] crayon_1.5.1 dplyr_1.0.10
[39] dbplyr_2.2.1 GenomicAlignments_1.32.1
[41] bitops_1.0-7 rappdirs_0.3.3
[43] grid_4.2.0 lifecycle_1.0.3
[45] DBI_1.1.3 magrittr_2.0.3
[47] cli_3.4.1 stringi_1.7.6
[49] cachem_1.0.6 xml2_1.3.3
[51] ellipsis_0.3.2 filelock_1.0.2
[53] vctrs_0.5.1 generics_0.1.3
[55] rjson_0.2.21 restfulr_0.0.15
[57] tools_4.2.0 bit64_4.0.5
[59] BSgenome_1.64.0 glue_1.6.2
[61] purrr_0.3.4 hms_1.1.2
[63] yaml_2.3.5 parallel_4.2.0
[65] fastmap_1.1.0 AnnotationDbi_1.58.0
[67] BiocManager_1.30.18 memoise_2.0.1
[69] BiocIO_1.6.0
Bioconductor version '3.15'
In this environment VariantAnnotation::info() worked well
R version 4.0.0 (2020-04-24) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS: /usr/lib64/libblas.so.3.4.2 LAPACK: /usr/lib64/liblapack.so.3.4.2
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] VariantAnnotation_1.34.0 Rsamtools_2.4.0
[3] Biostrings_2.56.0 XVector_0.30.0
[5] SummarizedExperiment_1.20.0 Biobase_2.50.0
[7] MatrixGenerics_1.2.1 matrixStats_0.61.0
[9] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
[11] IRanges_2.24.1 S4Vectors_0.28.1
[13] BiocGenerics_0.36.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8 lattice_0.20-41
[3] prettyunits_1.1.1 assertthat_0.2.1
[5] utf8_1.2.2 BiocFileCache_1.12.1
[7] R6_2.5.1 RSQLite_2.2.7
[9] httr_1.4.2 pillar_1.6.4
[11] zlibbioc_1.36.0 rlang_0.4.12
[13] GenomicFeatures_1.40.1 progress_1.2.2
[15] curl_4.3.2 rstudioapi_0.13
[17] blob_1.2.2 Matrix_1.3-4
[19] BiocParallel_1.24.1 stringr_1.4.0
[21] RCurl_1.98-1.5 bit_4.0.4
[23] biomaRt_2.44.4 DelayedArray_0.16.3
[25] rtracklayer_1.48.0 compiler_4.0.0
[27] pkgconfig_2.0.3 askpass_1.1
[29] openssl_1.4.6 tidyselect_1.1.1
[31] tibble_3.1.6 GenomeInfoDbData_1.2.4
[33] XML_3.99-0.6 fansi_1.0.2
[35] crayon_1.4.2 dplyr_1.0.7
[37] dbplyr_2.1.1 GenomicAlignments_1.24.0
[39] bitops_1.0-7 rappdirs_0.3.3
[41] grid_4.0.0 lifecycle_1.0.1
[43] DBI_1.1.1 magrittr_2.0.1
[45] stringi_1.7.6 cachem_1.0.6
[47] xml2_1.3.3 ellipsis_0.3.2
[49] vctrs_0.3.8 generics_0.1.1
[51] tools_4.0.0 bit64_4.0.5
[53] BSgenome_1.56.0 glue_1.6.0
[55] purrr_0.3.4 hms_1.1.1
[57] fastmap_1.1.0 AnnotationDbi_1.50.3
[59] BiocManager_1.30.10 memoise_2.0.0
Bioconductor version '3.11'
Have you tried 1) load the saved object without evaluating it 2) run newobj = updateObject([oldobj]) see if info works on newobj?
if you can make available the old VCF or an example that fails with new R that could be helpful if this updateObject approach does not work.
I always load the RData where the object was stored -> hopefully this addressed the first suggestion. and renewing the object with updateObject() did not make it work.
Interestingly I just found that if I do info(test)
it gives same error as before;
however if I do info(test[1:nrow(test),])
then things works well again.
Unfortunately I cannot provide the old VCF as it contains sensitive data,
and if I tried to subset it, test2= test[1:5,]
; then info(test2)
works well.
In any case, this already lead to a workaround to the issue I encountered.
Whoever encounter this issue can renew the object by test= test[1:nrow(test),]
to avoid the error.
I'll probably close the issue here and thank you for the prompt feedback again :)
Have a nice weekend!
@shwong-tw Have you tried using the updateObject package to update your old instance? It may provide some benefits in that respect. Perhaps Hervé @hpages can comment further. Best, Marcel
Glad to hear about the workaround. I will fire up bioc 3.11 and see if I can export a VCF that will demonstrate the problem in current bioc. Then we can be more concrete about a repair.
Hi Marcel, When I tried BiocGenerics::updateObject function the error stayed. I just installed updateObject package as you suggested, however I don't see relevant function for updating vcf object.
Thank you!
Sorry it wasn't entirely clear to me whether you have an .Rds
or .Rda
file or an actual object. For the first two options, you can use update_rds_file
or update_rda_file
in the package, respectively.
Hi Marcel,
I used save()
function to store the intermediate data, therefore I suppose it is a .Rda data.
I just ran updateObject::update_rda_file
on this intermediate file and it seems to be doing something:
File test.Rda: load().. ok [10 object(s)]; updateObject(logical, check=FALSE).. no-op; updateObject(factor, check=FALSE).. no-op; updateObject(list, check=FALSE).. object updated; updateObject(list, check=FALSE).. object updated; updateObject(list, check=FALSE).. object updated; updateObject(list, check=FALSE).. object updated; updateObject(list, check=FALSE).. object updated; updateObject(numeric, check=FALSE).. no-op; updateObject(list, check=FALSE).. no-op; updateObject(list, check=FALSE).. object updated; saving file.. OK ==> 1
However after loading the updated object, info()
still gives the same error.
Other than that, I would suggest to warn the users that updateObject::update_rda_file
overwrites the original file by default.
Thank you and have a nice weekend!
Hi @shwong-tw
Thanks for testing the updateObject::update_rda_file
.
The point you make may be helpful for the documentation in the package (cc: @hpages).
We will get back to you when we have a reproducible example.
Best regards,
Marcel
@shwong-tw At the root of the problem is that BiocGenerics::updateObject()
doesn't seem to be able to fix your old CollapsedVCF instance. Using the updateObject package won't change that because all what this package does is provide some convenience wrappers around BiocGenerics::updateObject()
.
So we need to understand why BiocGenerics::updateObject()
fails to fix your old CollapsedVCF instance. However this is very hard without having access to it. Maybe you can run the following:
library(VariantAnnotation)
load(...) or data(...)
vcf # try to display the object
class(vcf@info)
vcf <- BiocGenerics::updateObject(vcf, verbose=TRUE)
vcf # try to display the object again
class(vcf@info)
vcf_info <- info(vcf)
class(vcf_info)
vcf_info
and share the output here? Note that you want to do this with the most recent version of Bioconductor that you have access to (seems like it's BioC 3.15 for you but note that the most current version is BioC 3.16, I suggest that you update your installation ASAP).
I have a feeling that the problem is that vcf@info
is an old DataFrame instance that needs to be replaced with a DFrame instance but BiocGenerics::updateObject(vcf)
doesn't do that (it ignores the slot).
Thanks!
@shwong-tw
So here's one way to reproduce this (with BioC 3.16):
library(VariantAnnotation)
fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
class(vcf@info)
# [1] "DFrame"
# attr(,"package")
# [1] "S4Vectors"
class(vcf@info) <- "DataFrame"
info(vcf)
# Error: C stack usage 7969924 is too close to the limit
See my sessionInfo()
below.
FWIW I just added an updateObject()
method for VCF objects to VariantAnnotation 1.44.1 (BioC 3.16) and 1.45.1 (BioC 3.17). This method should be able to fix the info
and fixed
slots of your old CollapsedVCF instances.
These new VariantAnnotation versions should propagate and become available via BiocManager::install()
in the next 24-48 hours or so. However you will first need to update your installation to BioC 3.16 to get access to VariantAnnotation 1.44.1.
After you've tried updateObject()
on your old CollapsedVCF instance and made sure that everything works as expected with the updated instance, you should save()
it again to disk so you don't have to call updateObject()
again on it next time you load it.
If you have other old serialized S4 instances on your disk, say in the path/to/saved/objects/
folder, you should be able to run updateSerializedObjects("path/to/saved/objects", recursive=TRUE)
to update them all. Yes updateSerializedObjects()
is going to perform in-place replacement of the .rda
and/or .rds
files found in path/to/saved/objects/
but only if the serialized objects stored in those files actually needed to be updated.
Best, H.
sessionInfo():
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] VariantAnnotation_1.44.0 Rsamtools_2.14.0
[3] Biostrings_2.66.0 XVector_0.38.0
[5] SummarizedExperiment_1.28.0 Biobase_2.58.0
[7] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[9] IRanges_2.32.0 S4Vectors_0.36.1
[11] MatrixGenerics_1.10.0 matrixStats_0.63.0
[13] BiocGenerics_0.44.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 lattice_0.20-45 prettyunits_1.1.1
[4] png_0.1-8 assertthat_0.2.1 digest_0.6.31
[7] utf8_1.2.3 BiocFileCache_2.6.0 R6_2.5.1
[10] RSQLite_2.2.20 httr_1.4.4 pillar_1.8.1
[13] zlibbioc_1.44.0 rlang_1.0.6 GenomicFeatures_1.50.4
[16] progress_1.2.2 curl_5.0.0 blob_1.2.3
[19] Matrix_1.5-3 BiocParallel_1.32.5 stringr_1.5.0
[22] RCurl_1.98-1.10 bit_4.0.5 biomaRt_2.54.0
[25] DelayedArray_0.24.0 rtracklayer_1.58.0 compiler_4.2.2
[28] pkgconfig_2.0.3 tidyselect_1.2.0 KEGGREST_1.38.0
[31] tibble_3.1.8 GenomeInfoDbData_1.2.9 codetools_0.2-19
[34] XML_3.99-0.13 fansi_1.0.4 crayon_1.5.2
[37] dplyr_1.1.0 dbplyr_2.3.0 GenomicAlignments_1.34.0
[40] bitops_1.0-7 rappdirs_0.3.3 grid_4.2.2
[43] lifecycle_1.0.3 DBI_1.1.3 magrittr_2.0.3
[46] cli_3.6.0 stringi_1.7.12 cachem_1.0.6
[49] xml2_1.3.3 ellipsis_0.3.2 filelock_1.0.2
[52] vctrs_0.5.2 generics_0.1.3 rjson_0.2.21
[55] restfulr_0.0.15 tools_4.2.2 bit64_4.0.5
[58] BSgenome_1.66.2 glue_1.6.2 hms_1.1.2
[61] yaml_2.3.7 parallel_4.2.2 fastmap_1.1.0
[64] AnnotationDbi_1.60.0 memoise_2.0.1 BiocIO_1.8.0
Dear developer,
I would like to apply info() function on a CollapsedVCF object that was previously stored.
Using info() from VariantAnnotation version < 1.34.0 this works well; however using this function from version 1.42.1 it gives error as below: "C stack usage 7971012 is too close to the limit" I didn't try versions in between 1.34.0 and 1.42.1.
Would you kindly provide me with some insight on solving this issue.
Thank you very much!