Bioconductor / AnnotationForge

Tools for building SQLite-based annotation data packages
https://bioconductor.org/packages/AnnotationForge
4 stars 9 forks source link

makeOrgPackageFromNCBI() error message #21

Closed mbxds5 closed 2 years ago

mbxds5 commented 2 years ago

Hi,

This is my first time trying to use this package, but every time I try and run the example I get the following error message:

makeOrgPackageFromNCBI(version = "0.1", author = "Some One so@someplace.org", maintainer = "Some One so@someplace.org", outputDir = ".", tax_id = "59729", genus = "Taeniopygia", species = "guttata") If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day. preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz extracting data for our organism from : gene2pubmed getting data for gene2accession.gz Error: no such table: gene2accession_date In addition: Warning message: In result_fetch(res@ptr, n = n) : SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().

Any help would be greatly appreciated! Thanks

sessioninfo::session_info() ─ Session info ────────────────────────────────────────────────────────────── setting value
version R version 4.0.3 (2020-10-10) os macOS Mojave 10.14.6
system x86_64, darwin17.0
ui AQUA
language (EN)
collate en_GB.UTF-8
ctype en_GB.UTF-8
tz Europe/London
date 2021-10-04

─ Packages ────────────────────────────────────────────────────────────────── package version date lib source
AnnotationDbi
1.52.0 2020-10-27 [1] Bioconductor
AnnotationForge 1.32.0 2020-10-27 [1] Bioconductor
Biobase
2.50.0 2020-10-27 [1] Bioconductor
BiocGenerics 0.36.1 2021-04-16 [1] Bioconductor
BiocManager 1.30.16 2021-06-15 [1] CRAN (R 4.0.2) bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2) bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2) bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.2) blob 1.2.2 2021-07-23 [1] CRAN (R 4.0.2) cachem 1.0.6 2021-08-19 [1] CRAN (R 4.0.2) cli 3.0.1 2021-07-17 [1] CRAN (R 4.0.2) DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2) IRanges
2.24.1 2020-12-12 [1] Bioconductor
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.2) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.0.2) RCurl 1.98-1.5 2021-09-17 [1] CRAN (R 4.0.2) rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2) RSQLite 2.2.8 2021-08-21 [1] CRAN (R 4.0.2) S4Vectors 0.28.1 2020-12-09 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2) withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2) XML 3.99-0.8 2021-09-17 [1] CRAN (R 4.0.2)

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

hpages commented 2 years ago

Jim, @jmacdon, do you think you can take a look at this? Thanks!

jmacdon commented 2 years ago

@hpages I'll take a look

jmacdon commented 2 years ago

@mbxds5 I don't see a similar problem, and I can't even find where anything called 'gene2accession_date' is used in any Bioconductor package. So first off I would recommend upgrading to R-4.1.2 and the current Bioconductor release.

I do have an issue getting the Ensembl data, due to intermittent outages at the Biomart server, but that isn't a problem with AnnotationForge.

I should also point out that generating your own OrgDb package is usually only necessary if you have a really exotic species. There are lots of OrgDb type objects that you can get from the AnnotationHub (see the vignette for that package), and there might be one for whatever species you are interested in.

If you upgrade and are still having problems, please also include the results from running traceback() immediately after you get the error.

mbxds5 commented 2 years ago

Hi @jmacdon, really appreciate you taking the time to look into the issue. I've updated R and Bioconductor but I'm still getting the same error message... any ideas?

If files are not cached locally this may take awhile to assemble a 12 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.
preparing data from NCBI ...
starting download for  
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
Error: no such table: gene2accession_date
traceback()
18: stop(structure(list(message = "no such table: gene2accession_date", 
        call = NULL, cppstack = NULL), class = c("Rcpp::exception", 
    "C++Error", "error", "condition")))
17: result_create(conn@ptr, statement)
16: initialize(value, ...)
15: initialize(value, ...)
14: new("SQLiteResult", sql = statement, ptr = result_create(conn@ptr, 
        statement), conn = conn, bigint = conn@bigint)
13: .local(conn, statement, ...)
12: dbSendQuery(conn, statement, ...)
11: dbSendQuery(conn, statement, ...)
10: .local(conn, statement, ...)
9: dbGetQuery(NCBIcon, paste0("SELECT date FROM ", tblNm))
8: dbGetQuery(NCBIcon, paste0("SELECT date FROM ", tblNm))
7: .getNCBIDateStamp(NCBIcon, tableName)
6: .isNCBICurrentWith(NCBIcon, tableName)
5: .downloadData(files[i], tax_id, NCBIFilesDir = NCBIFilesDir, 
       rebuildCache = rebuildCache, verbose = verbose)
4: .makeBaseDBFromDLs(files, tax_id, NCBIcon, NCBIFilesDir, rebuildCache, 
       verbose)
3: prepareDataFromNCBI(tax_id, NCBIFilesDir, outputDir, rebuildCache, 
       verbose)
2: NEW_makeOrgPackageFromNCBI(version, maintainer, author, outputDir, 
       tax_id, genus, species, NCBIFilesDir, databaseOnly, rebuildCache = rebuildCache, 
       verbose = verbose)
1: makeOrgPackageFromNCBI(version = "0.1", author = "Some One <so@someplace.org>", 
       maintainer = "Some One <so@someplace.org>", outputDir = ".", 
       tax_id = "59729", genus = "Taeniopygia", species = "guttata")
sessionInfo()  
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
[1] AnnotationForge_1.34.0 AnnotationDbi_1.54.1   IRanges_2.26.0        
[4] S4Vectors_0.30.2       Biobase_2.52.0         BiocGenerics_0.38.0
loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7             XVector_0.32.0         zlibbioc_1.38.0       
 [4] bit_4.0.4              R6_2.5.1               rlang_0.4.11          
 [7] fastmap_1.1.0          blob_1.2.2             httr_1.4.2            
[10] GenomeInfoDb_1.28.4    tools_4.1.1            png_0.1-7             
[13] DBI_1.1.1              bit64_4.0.5            crayon_1.4.1          
[16] GenomeInfoDbData_1.2.6 BiocManager_1.30.16    bitops_1.0-7          
[19] vctrs_0.3.8            KEGGREST_1.32.0        RCurl_1.98-1.5        
[22] memoise_2.0.0          cachem_1.0.6           RSQLite_2.2.8         
[25] compiler_4.1.1         Biostrings_2.60.2      XML_3.99-0.8          
[28] pkgconfig_2.0.3
jmacdon commented 2 years ago

@mbxds5 I assume there is a NCBI.sqlite file in the directory that you are using to build the package? The error you are getting comes from a test to see if the data are current enough to use. And that test queries the NCBI.sqlite DB to see when you last downloaded the gene2accession.gz file by getting the date in the gene2accession_date table. Apparently your NCBI.sqlite DB doesn't have that table, which is causing the error.

Since you will be downloading all the data anyway, just delete the NCBI.sqlite file and then try again. If it doesn't work let me know.