Bioconductor / AnnotationForge

Tools for building SQLite-based annotation data packages
https://bioconductor.org/packages/AnnotationForge
4 stars 9 forks source link

Error in compiling package #25

Closed BaylorSci closed 2 years ago

BaylorSci commented 2 years ago

I am creating a new annotation package for my species of interest using the following code:

makeOrgPackageFromNCBI(version="0.1",
                       maintainer=" It  <>",
                       author="it <>", 
                       outputDir = ".",
                       tax_id="90988",
                       genus="Pimephales",
                       species="promelas",
                       rebuildCache = FALSE)

I have manually downloaded the required files and placed in the wkdir, since I initially had issues with connecting. The error i am getting relates to the portion whereby the ids are annotation with ensembl IDs.

Please be patient while we work out which organisms can be annotated with ensembl IDs.
Ensembl site unresponsive, trying uswest mirror
Ensembl site unresponsive, trying useast mirror
Ensembl site unresponsive, trying asia mirror
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'table' in selecting a method for function '%in%': invalid 'text' argument

Googling around the problem, this seems to have been reported frequently on the biomaRt github forum (the Ensembl site unresponsive), and was fixed by directly specifying which mart to use (and its HTTPs address).

The likelihood is minimal that ensembl would have many ids for my particular species, so I reran this with another species ( Taeniopygia guttata) to check was it species specific, and for the exact same error message.

I wonder if you can advise on how to correct this? Is there a way for this to just return an error, but still compile the annotation package?

BaylorSci commented 2 years ago
traceback()
18: h(simpleError(msg, call))
17: .handleSimpleError(function (cond) 
    .Internal(C_tryCatchHelper(addr, 1L, cond)), "invalid 'text' argument", 
        base::quote(textConnection(text, encoding = "UTF-8")))
16: textConnection(text, encoding = "UTF-8")
15: read.table(text = attrfilt, sep = "\t", header = FALSE, quote = "", 
        comment.char = "", as.is = TRUE)
14: .getAttrFilt(mart = mart, verbose = verbose, type = "attributes")
13: .getAttributes(mart, verbose = verbose)
12: useDataset(mart = mart, dataset = dataset, verbose = verbose)
11: .useMart(biomart = biomart, dataset = dataset, host = host, verbose = verbose, 
        port = port, ensemblRedirect = ensemblRedirect, httr_config = httr_config)
10: biomaRt::useEnsembl("ensembl", datSet)
9: FUN(X[[i]], ...)
8: lapply(names(datSets), .ensemblMapsToEntrezId, datSets = datSets)
7: lapply(names(datSets), .ensemblMapsToEntrezId, datSets = datSets)
6: unlist(lapply(names(datSets), .ensemblMapsToEntrezId, datSets = datSets))
5: available.ensembl.datasets()
4: tax_id %in% names(available.ensembl.datasets())
3: prepareDataFromNCBI(tax_id, NCBIFilesDir, outputDir, rebuildCache, 
       verbose)
2: NEW_makeOrgPackageFromNCBI(version, maintainer, author, outputDir, 
       tax_id, genus, species, NCBIFilesDir, databaseOnly, rebuildCache = rebuildCache, 
       verbose = verbose)
1: makeOrgPackageFromNCBI(version = "0.1", author = "Some One <so@someplace.org>", 
       maintainer = "Some One <so@someplace.org>", outputDir = ".", 
       tax_id = "59729", genus = "Taeniopygia", species = "guttata", 
       rebuildCache = FALSE)`

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GO.db_3.14.0           biomaRt_2.50.1         AnnotationForge_1.36.0 AnnotationDbi_1.56.2   IRanges_2.28.0        
[6] S4Vectors_0.32.3       Biobase_2.54.0         BiocGenerics_0.40.0    BiocManager_1.30.16   

loaded via a namespace (and not attached):
 [1] KEGGREST_1.34.0        progress_1.2.2         tidyselect_1.1.1       purrr_0.3.4            vctrs_0.3.8           
 [6] generics_0.1.1         BiocFileCache_2.2.0    utf8_1.2.2             blob_1.2.2             XML_3.99-0.8          
[11] rlang_0.4.12           pillar_1.6.4           withr_2.4.3            glue_1.6.0             DBI_1.1.2             
[16] rappdirs_0.3.3         bit64_4.0.5            dbplyr_2.1.1           GenomeInfoDbData_1.2.7 lifecycle_1.0.1       
[21] stringr_1.4.0          zlibbioc_1.40.0        Biostrings_2.62.0      memoise_2.0.1          fastmap_1.1.0         
[26] GenomeInfoDb_1.30.0    curl_4.3.2             fansi_1.0.0            Rcpp_1.0.7             filelock_1.0.2        
[31] cachem_1.0.6           XVector_0.34.0         bit_4.0.4              hms_1.1.1              png_0.1-7             
[36] digest_0.6.29          stringi_1.7.6          dplyr_1.0.7            tools_4.1.2            bitops_1.0-7          
[41] magrittr_2.0.1         RCurl_1.98-1.5         RSQLite_2.2.9          tibble_3.1.6           crayon_1.4.2          
[46] pkgconfig_2.0.3        ellipsis_0.3.2         xml2_1.3.3             prettyunits_1.1.1      assertthat_0.2.1      
[51] httr_1.4.2             R6_2.5.1               compiler_4.1.2 
vjcitn commented 2 years ago

Sorry for the delay in responding to this. Are you aware that this organism's annotation can be acquired via AnnotationHub: after ah = AnnotationHub::AnnotationHub(),

> xx = ah[["AH96170"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: 'Biobase'

The following object is masked from 'package:AnnotationHub':

    cache

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    expand.grid, I, unname

> xx
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Pimephales promelas
| SPECIES: Pimephales promelas
| CENTRALID: GID
| Taxonomy ID: 90988
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information
BaylorSci commented 2 years ago

I am so embarrassed. I checked annotation hub about two years ago and this was not a supported species which is why I was using your package, but that's fantastic and will make all the ongoing projects much easier for me.

I will say I continued to have issues with trying to create annotation packages, irrespective of the species i used on windows but did not have any issues with my Mac, so there may be a conflict in one of the packages.

Thanks for letting me know about this!!!