Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
31 stars 13 forks source link

Error in chromInfo file download when running seqlevelsStyle() #102

Closed Enterprise-D closed 9 months ago

Enterprise-D commented 9 months ago

Hi, I have encountered this error since this afternoon. The error originates from GenomeInfoDb:::fetch_table_from_url() (which I tried to override). It would be good to cache the file instead of downloading each time. I don't know if it's the server's fault but the same error popped up on the HPC and my local machine. I could download the chromInfo.txt.gz by manually navigating the ftp but sometimes there would be forbidden error.

` library(Seurat) library(Signac) library(EnsDb.Mmusculus.v79) library(BSgenome.Mmusculus.UCSC.mm10)

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79) seqlevelsStyle(annotations) <- "UCSC" `

Error in download.file(url, destfile, quiet = TRUE) (mm10.R#62): cannot open URL 'https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz' Show stack trace

sessionInfo()

R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.3

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] BSgenome.Mmusculus.UCSC.mm10_1.4.3 BSgenome_1.70.1 rtracklayer_1.62.0
[4] BiocIO_1.12.0 Biostrings_2.70.1 XVector_0.42.0
[7] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.26.0 AnnotationFilter_1.26.0
[10] GenomicFeatures_1.54.1 AnnotationDbi_1.64.1 Biobase_2.62.0
[13] GenomicRanges_1.54.1 GenomeInfoDb_1.38.0 IRanges_2.36.0
[16] S4Vectors_0.40.1 BiocGenerics_0.48.1 Signac_1.12.0
[19] Seurat_5.0.1 SeuratObject_5.0.1 sp_2.1-2

loaded via a namespace (and not attached): [1] RcppAnnoy_0.0.21 splines_4.3.2 later_1.3.2
[4] bitops_1.0-7 filelock_1.0.3 tibble_3.2.1
[7] polyclip_1.10-6 rpart_4.1.23 XML_3.99-0.16
[10] fastDummies_1.7.3 lifecycle_1.0.4 globals_0.16.2
[13] lattice_0.22-5 MASS_7.3-60.0.1 backports_1.4.1
[16] magrittr_2.0.3 rmarkdown_2.25 Hmisc_5.1-1
[19] plotly_4.10.4 yaml_2.3.8 httpuv_1.6.13
[22] sctransform_0.4.1 spam_2.10-0 spatstat.sparse_3.0-3
[25] reticulate_1.34.0 cowplot_1.1.2 pbapply_1.7-2
[28] DBI_1.2.1 RColorBrewer_1.1-3 abind_1.4-5
[31] zlibbioc_1.48.0 Rtsne_0.17 purrr_1.0.2
[34] biovizBase_1.48.0 RCurl_1.98-1.14 nnet_7.3-19
[37] VariantAnnotation_1.46.0 rappdirs_0.3.3 GenomeInfoDbData_1.2.11
[40] ggrepel_0.9.5 irlba_2.3.5.1 listenv_0.9.0
[43] spatstat.utils_3.0-4 goftest_1.2-3 RSpectra_0.16-1
[46] spatstat.random_3.2-2 fitdistrplus_1.1-11 parallelly_1.36.0
[49] DelayedArray_0.28.0 leiden_0.4.3.1 codetools_0.2-19
[52] RcppRoll_0.3.0 xml2_1.3.6 tidyselect_1.2.0
[55] base64enc_0.1-3 matrixStats_1.2.0 BiocFileCache_2.10.1
[58] spatstat.explore_3.2-5 GenomicAlignments_1.38.0 jsonlite_1.8.8
[61] Formula_1.2-5 ellipsis_0.3.2 progressr_0.14.0
[64] ggridges_0.5.5 survival_3.5-7 tools_4.3.2
[67] progress_1.2.3 ica_1.0-3 Rcpp_1.0.12
[70] glue_1.7.0 SparseArray_1.2.2 gridExtra_2.3
[73] xfun_0.41 MatrixGenerics_1.14.0 dplyr_1.1.4
[76] fastmap_1.1.1 fansi_1.0.6 digest_0.6.34
[79] R6_2.5.1 mime_0.12 colorspace_2.1-0
[82] scattermore_1.2 tensor_1.5 dichromat_2.0-0.1
[85] spatstat.data_3.0-4 biomaRt_2.58.0 RSQLite_2.3.4
[88] utf8_1.2.4 tidyr_1.3.0 generics_0.1.3
[91] data.table_1.14.10 S4Arrays_1.2.0 prettyunits_1.2.0
[94] httr_1.4.7 htmlwidgets_1.6.4 uwot_0.1.16
[97] pkgconfig_2.0.3 gtable_0.3.4 blob_1.2.4
[100] lmtest_0.9-40 htmltools_0.5.7 dotCall64_1.1-1
[103] ProtGenerics_1.34.0 scales_1.3.0 png_0.1-8
[106] rstudioapi_0.15.0 knitr_1.45 reshape2_1.4.4
[109] rjson_0.2.21 checkmate_2.3.1 nlme_3.1-164
[112] curl_5.2.0 zoo_1.8-12 cachem_1.0.8
[115] stringr_1.5.1 KernSmooth_2.23-22 parallel_4.3.2
[118] miniUI_0.1.1.1 foreign_0.8-86 restfulr_0.0.15
[121] pillar_1.9.0 grid_4.3.2 vctrs_0.6.5
[124] RANN_2.6.1 promises_1.2.1 dbplyr_2.4.0
[127] xtable_1.8-4 cluster_2.1.6 htmlTable_2.4.2
[130] evaluate_0.23 cli_3.6.2 compiler_4.3.2
[133] Rsamtools_2.18.0 rlang_1.1.3 crayon_1.5.2
[136] future.apply_1.11.1 plyr_1.8.9 stringi_1.8.3
[139] viridisLite_0.4.2 deldir_2.0-2 BiocParallel_1.36.0
[142] munsell_0.5.0 lazyeval_0.2.2 spatstat.geom_3.2-7
[145] Matrix_1.6-5 RcppHNSW_0.5.0 hms_1.1.3
[148] patchwork_1.2.0 bit64_4.0.5 future_1.33.1
[151] ggplot2_3.4.4 KEGGREST_1.42.0 shiny_1.8.0
[154] SummarizedExperiment_1.32.0 ROCR_1.0-11 igraph_1.6.0
[157] memoise_2.0.1 fastmatch_1.1-4 bit_4.0.5

hpages commented 9 months ago

See https://groups.google.com/a/soe.ucsc.edu/g/genome/c/zxS5jah4eZo/m/Jxuprb9BAQAJ

It would be good to cache the file instead of downloading each time.

Caching happens but not when you think it does. seqlevelsStyle(gr) <- "UCSC" uses getChromInfoFromUCSC() internally which will download the file only once and cache the result of parsing it (a data.frame).

Note that the caching lasts only for the current session. Caching permanently would not be a good idea because, believe it or not, the content of these chromInfo.txt.gz files sometimes change (a rare event but it has happened a few times in the past).

hpages commented 9 months ago

Oh, I forgot about this feature but in the meantime you should be able to set global option UCSC.goldenPath.url to "https://hgdownload.soe.ucsc.edu/goldenPath" with:

options(UCSC.goldenPath.url="https://hgdownload.soe.ucsc.edu/goldenPath")

Then:

> seqlevelsStyle(annotations) <- "UCSC"  # works!

> seqinfo(annotations)
Seqinfo object with 22 sequences (1 circular) from mm10 genome:
  seqnames seqlengths isCircular genome
  chr3      160039680      FALSE   mm10
  chrX      171031299      FALSE   mm10
  chr16      98207768      FALSE   mm10
  chr7      145441459      FALSE   mm10
  chr11     122082543      FALSE   mm10
  ...             ...        ...    ...
  chr18      90702639      FALSE   mm10
  chr1      195471971      FALSE   mm10
  chr12     120129022      FALSE   mm10
  chr19      61431566      FALSE   mm10
  chrM          16299       TRUE   mm10
Enterprise-D commented 9 months ago

It works, thank you very much!