Bioconductor / AnnotationForge

Tools for building SQLite-based annotation data packages
https://bioconductor.org/packages/AnnotationForge
4 stars 9 forks source link

makeOrgPackage - ERROR: Two fields in the source DB have the same name. #14

Closed javier-pardodiaz closed 3 years ago

javier-pardodiaz commented 4 years ago

Hello,

I am getting an error when using the Bioconductor packages AnnotationDbi and AnnotationForge. I am generating an organism package using the makeOrgPackage function in the AnnotationForge package. To do so, I am using two dataframes: gids, with two columns "GID" and "SYMBOL", and "go_info" with three columns "GID", "GO" and "EVIDENCE". The latter dataframe is the goTable.

After generating and installing the package, I try to use the godata function in the AnnotationDbi package but I get an error. It says that two fields in the source DB have the same name. The same error raises when trying to get the keys for "GO" and "EVIDENCE".

Could you please help me to solve this issue? Thank you very much!

I copy the code and error messages below.

> head(go_info)
     GID         GO EVIDENCE
1 RL4439 GO:0005975      IEA
2 RL4439 GO:0019752      IEA
3 RL4439 GO:0055114      IEA
4 RL4439 GO:0003824      IEA
5 RL4439 GO:0016491      IEA
6 RL4439 GO:0016616      IEA
> head(gids)
        GID    SYMBOL
1    RL4439    RL4439
2 pRL120793 pRL120793
3 pRL120792 pRL120792
4 pRL120791 pRL120791
5 pRL120790 pRL120790
6 pRL120789 pRL120789
> makeOrgPackage(gene_info=gids,gocodes=go_info,
+                version="0.1",
+                maintainer="Javier Pardo-Diaz <jdiaz@stats.ox.ac.uk>",
+                author="Javier Pardo-Diaz <jdiaz@stats.ox.ac.uk>",
+                outputDir = ".",
+                tax_id="216596",
+                genus="Rhizobium",
+                species="leguminosarum.bv.viciae.2.3841",
+                goTable = "gocodes")
Populating genes table:
genes table filled
Populating gene_info table:
gene_info table filled
Populating gocodes table:
gocodes table filled
table metadata filled
'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1 mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Creating package in ./org.Rleguminosarum.bv.viciae.2.3841.eg.db 
Now deleting temporary database file
[1] "./org.Rleguminosarum.bv.viciae.2.3841.eg.db"
There were 50 or more warnings (use warnings() to see the first 50)
> 
> install.packages("./org.Rleguminosarum.bv.viciae.2.3841.eg.db", repos=NULL)
Installing package into ‘/home/javier/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package ‘org.Rleguminosarum.bv.viciae.2.3841.eg.db’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (org.Rleguminosarum.bv.viciae.2.3841.eg.db)
> 
> library(org.Rleguminosarum.bv.viciae3.3841.eg.db)
> library(GOSemSim)
> hsGO <- godata("org.Rleguminosarum.bv.viciae.2.3841.eg.db" , keytype="GID", ont="MF")
Loading required package: org.Rleguminosarum.bv.viciae.2.3841.eg.db

preparing gene to GO mapping data...
Error in FUN(X[[i]], ...) : 
  Two fields in the source DB have the same name.
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Rleguminosarum.bv.viciae.2.3841.eg.db_0.1 org.Rleguminosarum.bv.viciae.1.3841.eg.db_0.1 org.Rleguminosarum.bv.viciae3.3841.eg.db_0.1 
 [4] org.Rleguminosarum.bv.viciae2.3841.eg.db_0.1  org.Rleguminosarum.bv.viciae.3841.eg.db_0.1   stringr_1.4.0                                
 [7] org.Rleguminosarum3841.eg.db_0.2              org.Hs.eg.db_3.10.0                           org.Rleguminosarumbvvc3841.eg.db_0.2         
[10] AnnotationForge_1.28.0                        org.Rleguminosarumbvviciae3841.eg.db_0.1      AnnotationDbi_1.48.0                         
[13] IRanges_2.20.2                                S4Vectors_0.24.4                              Biobase_2.46.0                               
[16] BiocGenerics_0.32.0                           GOSemSim_2.12.1                              

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6    magrittr_1.5    bit_1.1-15.2    rlang_0.4.6     blob_1.2.1      tools_3.6.3     DBI_1.1.0       bit64_0.9-7     digest_0.6.25   vctrs_0.2.4    
[11] bitops_1.0-6    RCurl_1.98-1.2  memoise_1.1.0   RSQLite_2.2.0   stringi_1.4.6   compiler_3.6.3  GO.db_3.10.0    XML_3.99-0.3    pkgconfig_2.0.3
> keytypes(org.Rleguminosarum.bv.viciae.2.3841.eg.db)
[1] "EVIDENCE"    "EVIDENCEALL" "GID"         "GO"          "GOALL"       "ONTOLOGY"    "ONTOLOGYALL" "SYMBOL"     
> keys(org.Rleguminosarum.bv.viciae.2.3841.eg.db,"GO")
Error in .deriveTableNameFromField(field = keytype, x) : 
  Two fields in the source DB have the same name.
huangziyan11111 commented 3 years ago

I have the same question. After I run the examples(completely follow scripts in part of 'Making use of makeOrgPackage'):(https://bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/MakingNewOrganismPackages.html), I run this scripts to retrieve some information. However got the same error. Please check the example codes and help us to solve it, thank you!

library("org.Tguttata.eg.db") kk <- keys(org.Tguttata.eg.db, keytype = "GID") a<-select(org.Tguttata.eg.db, keys = kk, keytype = "GID",columns = c("GO", "ONTOLOGY")) Error in FUN(X[[i]], ...) : Two fields in the source DB have the same name.

mtmorgan commented 3 years ago

This https://support.bioconductor.org/p/130924/#132633 implies that the problem is fixed in the most recent version. Do you have

BiocManager::version()

return 3.12 and

BiocManager::valid()

return TRUE ?

huangziyan11111 commented 3 years ago

@mtmorgan I run BiocManager::version() and returns 3.10. Run packageVersion("AnnotationDbi") and return 1.48.0 How can I solve? Shall I download R (version:4) and download AnnotationDbi in latest version?

mtmorgan commented 3 years ago

Yes, update to R version 4.0.3 / Bioconductor 3.12. Make sure that BiocManager::valid() returns TRUE so that all of your packages are updated. Post here if successful, so the issue can be closed.

huangziyan11111 commented 3 years ago

Yes, update to R version 4.0.3 / Bioconductor 3.12. Make sure that BiocManager::valid() returns TRUE so that all of your packages are updated. Post here if successful, so the issue can be closed.

Yes, it works. 🙂