Bioconductor / AnnotationForge

Tools for building SQLite-based annotation data packages
https://bioconductor.org/packages/AnnotationForge
4 stars 9 forks source link

problem with gene2unigene #13

Closed ldr89 closed 4 years ago

ldr89 commented 4 years ago

Hi,

I'm trying to build my own organism annotation package for Macaca fascicularis (tax_id = "9541") using makeOrgPackageFromNCBI() as described at https://support.bioconductor.org/p/118443/.

Unfortunately, I keep receiving an error when it comes to getting the data from the NCBI ftp for gene2unigene (see code + error & sessionInfo below). When checking the URL for gene2unigene, indeed it seems like it doesn't exist. After some searching, I found it at the data archive in the NCBI ftp (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ARCHIVE/gene2unigene). Is there any way to circumvent this problem?

Thanks so much!

Laura

makeOrgPackageFromNCBI(version = "0.0.1",
                       author = "Some One <so@someplace.org>",
                       maintainer = "Some One <so@someplace.org>",
                       outputDir = outputDir,
                       tax_id = "9541",
                       genus = "Macaca",
                       species = "fascicularis")

If this is the 1st time you have run this function, it may take, a long time (over an hour) to download needed files and assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day. preparing data from NCBI . . . starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene2unigene [5] gene_info.gz [6] gene2go.gz getting data for gene2pubmed.gz rebuilding the cache extracting data for our organism from : gene2pubmed getting data for gene2accession.gz rebuilding the cache extracting data for our organism from : gene2accession.gz getting data for gene2refseq.gz rebuilding the cache extracting data for our organism from : gene2refseq getting data for gene2unigene rebuilding the cache Error in .tryDL(url, tmp) : url access failed after 4 attempts; url: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2unigene In addition: Warning message: In result_fetch(res@ptr, n = n) : SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().

R version 3.6.3 (2020-02-29) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] biomaRt_2.42.1 AnnotationForge_1.28.0 AnnotationDbi_1.48.0
[4] IRanges_2.20.2 S4Vectors_0.24.4 Biobase_2.46.0
[7] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] progress_1.2.2 tidyselect_1.0.0 purrr_0.3.4 colorspace_1.4-1
[5] vctrs_0.2.4 BiocFileCache_1.10.2 htmltools_0.4.0 viridisLite_0.3.0
[9] yaml_2.2.1 blob_1.2.1 XML_3.99-0.3 plotly_4.9.2.1
[13] rlang_0.4.5 pillar_1.4.3 glue_1.4.0 withr_2.2.0
[17] DBI_1.1.0 rappdirs_0.3.1 dbplyr_1.4.3 bit64_0.9-7
[21] sessioninfo_1.1.1 lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0
[25] gtable_0.3.0 htmlwidgets_1.5.1 memoise_1.1.0 curl_4.3
[29] fansi_0.4.1 Rcpp_1.0.4.6 openssl_1.4.1 scales_1.1.0
[33] jsonlite_1.6.1 bit_1.1-15.2 askpass_1.1 ggplot2_3.3.0
[37] hms_0.5.3 digest_0.6.25 stringi_1.4.6 dplyr_0.8.5
[41] grid_3.6.3 cli_2.0.2 tools_3.6.3 bitops_1.0-6
[45] magrittr_1.5 lazyeval_0.2.2 RCurl_1.98-1.2 tibble_3.0.1
[49] RSQLite_2.2.0 crayon_1.3.4 tidyr_1.0.2 pkgconfig_2.0.3
[53] ellipsis_0.3.0 data.table_1.12.8 prettyunits_1.1.1 assertthat_0.2.1
[57] httr_1.4.1 rstudioapi_0.11 R6_2.4.1 compiler_3.6.3

lshep commented 4 years ago

Thank you. It looks like they are no longer maintaining the gene2unigene file. We will look into what the appropriate adjustment in the code is. Thank you for bringing it to our attention

kamalmdmostafa commented 4 years ago

Facing the same problem here. They have moved gene2unigene to the archive "ftp://ftp.ncbi.nih.gov/gene/DATA/ARCHIVE/gene2unigene" Is it going to work if I try to download it manually?

looking forward a fix to this problem.

lshep commented 4 years ago

I'm temporarily changing the code to access that file from the ARCHIVE directory. It should be pushed up later today and available by tomorrow afternoon. I've also started a discussion about potentially removing this table is the information is no longer going to be updated -- if anyone here wants to chime in on the developers mailing list thread on the importance or unimportance of this information https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016723.html

lshep commented 4 years ago

The fix has been pushed up to both release (1.30.1) and devel (1.31.1). It should build tonight and propagate in tomorrow's build report. Please let me know if there are any further issues