Closed npokorzynski closed 2 years ago
It seems to me this specific event is network related. I tried it and
> makeOrgPackageFromNCBI(version = "0.1", author = "Nick D. Pokorzynski <nick.pokorzynski@yale.edu>", maintainer = "Nick D. Pokorzynski <nick.pokorzynski@yale.ed>", outputDir = ".", tax_id = "588858", genus = "Salmonella", species = "enterica sv. Typhimurium 14028s", rebuildCache = TRUE)
If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
rebuilding the cache
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
But it is taking a long time ...
@vjcitn Yes, it ended up being a network issue. I actually got it to compile the OrgDB object, but I was never able to use it. If I recall correctly, the OrgDB object ended up being populated with very strange identifiers, for example it did not have any gene names or symbols, but it had GIDs. However, there were only like two hundred GIDs, for a species with 5000+ genes? And none of the numbers seemed to refer to anything associated with the specific genome (at least that I could find). Nothing that I found online was able to shed light on why that might be the case. I'd love to be able to get this working though.
@mrjc42 might you have a look?
@npokorzynski Would you have a look at the NCBI records directly? Sometimes they are just inadequate or broken. If it looks like our infrastructure is to blame, please reopen this issue.
Hi,
I'm trying to build an OrgDB object following the vignette:
makeOrgPackageFromNCBI(version = "0.1", author = "Nick D. Pokorzynski <nick.pokorzynski@yale.edu>", maintainer = "Nick D. Pokorzynski <nick.pokorzynski@yale.ed>", outputDir = ".", tax_id = "588858", genus = "Salmonella", species = "enterica sv. Typhimurium 14028s", rebuildCache = TRUE)
Yet every time I run this code, after checking for validity of package, etc., I get the following error:
I know that a similar issue has been reported in the past e.g. #11, but there doesn't seem to be any actual solution to this problem other than waiting for the connection to improve? If there is, please advise. Any help is appreciated.
Thanks in advance!
For reference:
traceback() 7: stop(paste(strwrap(msg, exdent = 2), collapse = "\n")) 6: .tryDL(url, tmp) 5: .downloadData(files[i], tax_id, NCBIFilesDir = NCBIFilesDir, rebuildCache = rebuildCache, verbose = verbose) 4: .makeBaseDBFromDLs(files, tax_id, NCBIcon, NCBIFilesDir, rebuildCache, verbose) 3: prepareDataFromNCBI(tax_id, NCBIFilesDir, outputDir, rebuildCache, verbose) 2: NEW_makeOrgPackageFromNCBI(version, maintainer, author, outputDir, tax_id, genus, species, NCBIFilesDir, databaseOnly, rebuildCache = rebuildCache, verbose = verbose) 1: makeOrgPackageFromNCBI(version = "0.1", author = "Nick D. Pokorzynski <nick.pokorzynski@yale.edu>", maintainer = "Nick D. Pokorzynski <nick.pokorzynski@yale.ed>", outputDir = ".", tax_id = "588858", genus = "Salmonella", species = "enterica sv. Typhimurium 14028s", rebuildCache = TRUE)
`sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.5
Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] AnnotationHub_3.0.1 BiocFileCache_2.0.0 dbplyr_2.1.1
[4] AnnotationForge_1.34.0 pathview_1.32.0 GOSemSim_2.18.1
[7] enrichplot_1.12.2 org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1
[10] clusterProfiler_4.0.3 DOSE_3.18.1 ggridges_0.5.3
[13] ggnewscale_0.4.5 EnhancedVolcano_1.10.0 ggrepel_0.9.1
[16] ggfortify_0.4.12 ggplot2_3.3.5 dplyr_1.0.7
[19] plyr_1.8.6 tibble_3.1.3 DESeq2_1.32.0
[22] SummarizedExperiment_1.22.0 Biobase_2.52.0 MatrixGenerics_1.4.2
[25] matrixStats_0.60.0 GenomicRanges_1.44.0 GenomeInfoDb_1.28.1
[28] IRanges_2.26.0 S4Vectors_0.30.0 BiocGenerics_0.38.0
loaded via a namespace (and not attached): [1] shadowtext_0.0.8 fastmatch_1.1-3 igraph_1.2.6
[4] lazyeval_0.2.2 splines_4.1.1 BiocParallel_1.26.1
[7] digest_0.6.27 htmltools_0.5.1.1 viridis_0.6.1
[10] GO.db_3.13.0 fansi_0.5.0 magrittr_2.0.1
[13] memoise_2.0.0 Biostrings_2.60.2 annotate_1.70.0
[16] graphlayouts_0.7.1 extrafont_0.17 extrafontdb_1.0
[19] colorspace_2.0-2 rappdirs_0.3.3 blob_1.2.2
[22] crayon_1.4.1 RCurl_1.98-1.3 jsonlite_1.7.2
[25] graph_1.70.0 scatterpie_0.1.6 genefilter_1.74.0
[28] survival_3.2-12 ape_5.5 glue_1.4.2
[31] polyclip_1.10-0 gtable_0.3.0 zlibbioc_1.38.0
[34] XVector_0.32.0 DelayedArray_0.18.0 proj4_1.0-10.1
[37] Rgraphviz_2.36.0 Rttf2pt1_1.3.9 maps_3.3.0
[40] scales_1.1.1 DBI_1.1.1 Rcpp_1.0.7
[43] viridisLite_0.4.0 xtable_1.8-4 tidytree_0.3.4
[46] bit_4.0.4 httr_1.4.2 fgsea_1.18.0
[49] RColorBrewer_1.1-2 ellipsis_0.3.2 pkgconfig_2.0.3
[52] XML_3.99-0.6 farver_2.1.0 locfit_1.5-9.4
[55] utf8_1.2.2 later_1.2.0 tidyselect_1.1.1
[58] labeling_0.4.2 rlang_0.4.11 reshape2_1.4.4
[61] BiocVersion_3.13.1 munsell_0.5.0 tools_4.1.1
[64] cachem_1.0.5 downloader_0.4 cli_3.0.1
[67] generics_0.1.0 RSQLite_2.2.7 stringr_1.4.0
[70] fastmap_1.1.0 yaml_2.2.1 ggtree_3.0.3
[73] bit64_4.0.5 tidygraph_1.2.0 purrr_0.3.4
[76] KEGGREST_1.32.0 ggraph_2.0.5 nlme_3.1-152
[79] mime_0.11 ash_1.0-15 ggrastr_0.2.3
[82] KEGGgraph_1.52.0 aplot_0.0.6 DO.db_2.9
[85] compiler_4.1.1 rstudioapi_0.13 interactiveDisplayBase_1.30.0 [88] filelock_1.0.2 curl_4.3.2 beeswarm_0.4.0
[91] png_0.1-7 treeio_1.16.1 tweenr_1.0.2
[94] geneplotter_1.70.0 stringi_1.7.3 ggalt_0.4.0
[97] lattice_0.20-44 Matrix_1.3-4 vctrs_0.3.8
[100] pillar_1.6.2 lifecycle_1.0.0 BiocManager_1.30.16
[103] data.table_1.14.0 cowplot_1.1.1 bitops_1.0-7
[106] httpuv_1.6.1 patchwork_1.1.1 qvalue_2.24.0
[109] R6_2.5.0 promises_1.2.0.1 KernSmooth_2.23-20
[112] gridExtra_2.3 vipor_0.4.5 MASS_7.3-54
[115] assertthat_0.2.1 withr_2.4.2 GenomeInfoDbData_1.2.6
[118] grid_4.1.1 tidyr_1.1.3 rvcheck_0.1.8
[121] ggforce_0.3.3 shiny_1.6.0 ggbeeswarm_0.6.0`
`BiocManager::valid("AnnotationForge") 'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories: CRAN: https://cran.rstudio.com/
[1] TRUE`
`BiocManager::install("AnnotationForge") 'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories: CRAN: https://cran.rstudio.com/
Bioconductor version 3.13 (BiocManager 1.30.16), R 4.1.1 (2021-08-10) Warning message: package(s) not installed when version(s) same as current; use
force = TRUE
to re-install: 'AnnotationForge'`