Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

org.Mxanthus.eg.db #1292

Closed eduardoillueca closed 4 years ago

eduardoillueca commented 4 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Hi @eduardoillueca

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: org.Mxanthus.eg.db
Title: Genome wide annotation for Myxococcus xanthus DK 1622
Description: Genome wide annotation for Myxococcus xanthus DK 1622, primarily based on mapping using Entrez Gene identifiers.
Version: 1.0.0
Author: Eduardo Illueca Fernandez
Maintainer: Eduardo Illueca Fernandez <eduardo.illueca@um.es>
Depends: R (>= 2.7.0), methods, AnnotationDbi (>= 1.46.0)
Suggests: DBI, annotate, RUnit
Imports: methods, AnnotationDbi
License: Artistic-2.0
organism: Myxococcus xanthus DK 1622
species: Myxococcus xanthus DK 1622
biocViews: AnnotationHub, OrgDb, AnnotationData

Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.

bioc-issue-bot commented 4 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

mtmorgan commented 4 years ago

I think @lshep will handle this package, but for technical reasons @hpages will remain as assignee for until another package is added to the review queue.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 4 years ago

The package needs to get fixed so that it can install/build correctly.
Most notably the .onLoad function is not correct. There is no extdata sqlite file anymore. This seems like all the old code that has not been updated.

You need to access the sqlite data from the hub. I would also recommend this not be done during an .onLoad method but rather have a actually R function that the user will run to download and have access to the orgDb object. I would recommend having this seperate R file and actually removing zzz.r completely.

in other words something like

getData <- function(){
    ah <- AnnotationHub()
    query(ah, c("Myxococcus xanthus DK 1622", "org.Mxanthus.eg.db"))[[1]
}

The download in this case returns directly the OrgDb.

Please also note the new code in org.Mxanthus.egBASE.Rd isn't in an example section and should be.

eduardoillueca commented 4 years ago

Thanks @lshep for the quick answer. I have updated org.Mxanthus.egBASE.Rd and I have included the code in an example section. Also, I have created a getData function in a zzz.R file anda I have removed the old org.Mxanthus.eg_dbconn.Rd file and I have created getData.Rd tha includes the documentation of this new function.

lshep commented 4 years ago

Can you set up your webhook and do a version bump to make sure it can build/install on our system?

lshep commented 4 years ago

The library calls shouldn't be in the R script. Please remove and use the NAMESPACE and DESCRIPTION to depend/import the necessary packages.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

79b00c0 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

db17338 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

339a2aa Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 4 years ago

DESCRIPTION

man

R code

BiocCheck

eduardoillueca commented 4 years ago

Thanks @lshep. I have updated the description file so I have put AnnotationHub::AnnotationHub() in depends and I import all the trhee packages in the description file (I have removed their from NAMESPACE). Also, I have signed up in the mailing list and support site.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

59e52c0 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 4 years ago

Ah I mean AnnotationHub in Depends and/or do AnnotationHub::AnnotationHub() in the R code of the zzz.R file

lshep commented 4 years ago

Sorry for the confustion on that

eduardoillueca commented 4 years ago

No problem. I will try this one

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

32d7852 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 4 years ago

Please add import(AnnotationHub) to the NAMESPACE - getting closer

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

9a8b51a Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 4 years ago

I think you original had AnnotationDbi and methods in the NAMESPACE as imports - was there a reason you removed them in this commit? I think adding import("AnnotationDbi") and import("methods") in the NAMESPACE should fix the last two major NAMESPACE warnings in the check. The package still shows ERROR but for the version number that I think is okay to ignore since this is an annotation package.

lshep commented 4 years ago

I think after you add the two into the NAMESPACE it will be ready to accept

eduardoillueca commented 4 years ago

Yes, the original had AnnotationDbi and methods in the NAMESPACE as imports. I will return to put this imports in NAMESPACE.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

ce6abd7 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

lshep commented 4 years ago

@mtmorgan reminder this is an annotation package. @dvantwisk this will probably have to be cloned and added manually to the builder.

eduardoillueca commented 4 years ago

Thanks very much for all your support, @lshep. I am at your disposal for any problems that may come up. Thanks very much again.

hpages commented 4 years ago

Thanks @eduardoillueca for your submission.

@lshep @mtmorgan I don't think we should accept this so quickly, without at least taking the time to consider a few important aspects of AnnotationHub-based data packages.

For example, in the present case, org.Mxanthus.eg.db looks like it belongs to the family of OrgDb packages (like org.Hs.eg.db). However OrgDb packages are expected to present a certain API so they can be used in places where an OrgDb package is expected. Right now, AFAIK, all OrgDb packages support the "old" map-based API:

library(org.Hs.eg.db)
org.Hs.egSYMBOL[["26074"]]
# [1] "CFAP61"
revmap(org.Hs.egALIAS2EG)[["26074"]]
# [1] "C20orf26"   "CaM-IP3"    "dJ1002M8.3" "dJ1178H5.4" "CFAP61"

and the "new" select API:

select(org.Hs.eg.db, keys="26074", columns=c("SYMBOL", "ALIAS"))
#   ENTREZID SYMBOL      ALIAS
# 1    26074 CFAP61   C20orf26
# 2    26074 CFAP61    CaM-IP3
# 3    26074 CFAP61 dJ1002M8.3
# 4    26074 CFAP61 dJ1178H5.4
# 5    26074 CFAP61     CFAP61

I would strongly suggest that org.Mxanthus.eg.db supports at least the select API. That is, one should be able to do something like:

library(org.Mxanthus.eg.db)
columns(org.Mxanthus.eg.db)
select(org.Mxanthus.eg.db, keys="7179", columns="SYMBOL")

out-of-the-box i.e. without having to do org.Mxanthus.eg.db <- org.Mxanthus.eg.db::getData() first.

It seems that this could be achieved via the use of an active binding here. See ?makeActiveBinding.

By doing this, you'll no longer need to export getData(). Plus the fact that the data will be fetched on-the-fly from AnnotationHub the first time org.Mxanthus.eg.db is accessed should be transparent to the end user.

This goes beyond the org.Mxanthus.eg.db case. It's probably worth taking the time to carefully think about what we want our AnnotationHub-based data packages to look like. With a strong focus on preserving backward compatibility. Ideally we should try to come up with some guidelines. Will benefit future migrations e.g. other OrgDb packages but also ChipDb, TxDb, BSgenome, etc...

I see a couple of other problems that are specific to org.Mxanthus.eg.db:

Best, H.

eduardoillueca commented 4 years ago

Thanks @hpages for your comments. I have found it really interesting so I have study carefully all the points. I have thought about possible solutions to the problems that you have noted and I would like to share them with you before updating the package.

`

columns(org.Mxanthus.eg.db) [1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GID" "GO" "GOALL"
[7] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"

select(org.Mxanthus.eg.db, keys="7179", columns="SYMBOL") 'select()' returned 1:1 mapping between keys and columns GID SYMBOL 1 7179 MXAN_RS04930

` And also more complex querys

`

select(org.Mxanthus.eg.db, keys="2000", columns=c("SYMBOL", "GO")) 'select()' returned 1:many mapping between keys and columns GID SYMBOL GO 1 2000 MXAN_RS14035 GO:0000166 2 2000 MXAN_RS14035 GO:0004222 3 2000 MXAN_RS14035 GO:0005524 4 2000 MXAN_RS14035 GO:0006508 5 2000 MXAN_RS14035 GO:0008233 6 2000 MXAN_RS14035 GO:0008237

`

Later, when I deleted the .sqlite data from the package I had to change the R script, delete the .onLoad method and create the getData function, that recovers the database from AnnotationHub. I have taken a look to the orgHs.eg.db package and I have seen that it exist a org.Hs.eg.sqlite file with the data and it It is much busier than org.Mxanthus.eg.sqlite. So, I think there are two potential solutions: the first is to return to the original architecture of the package (with the data) and the metadata.csv, make-metadata.R and make-data.R. The other solution is to modify the zzz.R file of the new script and use makeActiveBinding to solve this problem and support this querys without do org.Mxanthus.eg.db <- org.Mxanthus.eg.db::getData() previusly. I will adopt the solution that you advise me.

DBSCHEMA: MXANTHUS_DB GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/ UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197

What do you think about this? Thanks very much again for your reviews and suggestions. Of course, Of course, I am willing to work hard these days and try to publish this package in version 3.10.

Yours sincerely,

Eduardo Illueca

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

81f8c5e Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

eduardoillueca commented 4 years ago

Hi @lshep @hpages. I think I have fixed the first problem that @hpages told me. The error that I obtain is the error of the version. Now, when I install my package I could do this without do org.Mxanthus.eg.db <- org.Mxanthus.eg.db::getData():

columns(org.Mxanthus.eg.db)
    [1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GID" "GO" "GOALL"
    [7] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"

    select(org.Mxanthus.eg.db, keys="7179", columns="SYMBOL")
    'select()' returned 1:1 mapping between keys and columns
    GID SYMBOL
    1 7179 MXAN_RS04930

It is also compatible with others packages like clusterProfiler . To solve this I have updated this code in the zzz.R.


org.Mxanthus.egORGANISM <- "Myxococcus xanthus DK 1622"

getData <- function(){

  ah <- AnnotationHub::AnnotationHub()
  return(query(ah, c("Myxococcus xanthus DK 1622", "org.Mxanthus.eg.db"))[[1]])

}

.onLoad <- function(libname, pkgname){

  makeActiveBinding("org.Mxanthus.eg.db", getData , .GlobalEnv)

}

So, the next task is to add the following:

DBSCHEMA: MXANTHUS_DB
GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants
EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/
UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197 

I think that it is necessary to upload again the data to S3 to change it. In adittion, the column name is still GID for the reason explain before. If I could change it, it will be necessary to upload again the data to S3. In adition, I have thought to write a little manual to explain how to use this package with clusterProfiler and add some example.

hpages commented 4 years ago

Thanks Eduardo for getting back to us and for your work on org.Mxanthus.eg.db.

I'm glad you figured out how to use makeActiveBinding to delay the creation of the org.Mxanthus.eg.db object. You are pioneering a little bit here (I don't think we have any AnnotationHub-based OrgDb package yet). FYI we expect to see more AnnotationHub-based packages in the future.

I'm surprised AnnotationForge::makeOrgPackage() wants you to call the central column GID. My understanding is that AnnotationForge::makeOrgPackage() is used to produce most (if not all) of the OrgDb packages that we distribute. AFAIK all the Entrez-based ones (org.*.eg.db) use ENTREZID, not GID. I think this is a case where bringing the issue on GitHub (https://github.com/Bioconductor/AnnotationForge/issues) or on the bioc-devel mailing list would have made a lot of sense.

Thanks again for your great work on this!

H.

eduardoillueca commented 4 years ago

Thanks @hpages for your responses.

I think the wisest option is to open a new issue in [https://github.com/Bioconductor/AnnotationForge/issues](). I hope that they could be solve in these days to upload the package with the ENTREZID terminology. At the moment, I will stay with the GID column. On the other hand, I don't know if I need the support of @lshep to change the data that I have stored in S3.

Yours sincerely,

Eduardo Illueca

eduardoillueca commented 4 years ago

This is the link to the issue:

https://github.com/Bioconductor/AnnotationForge/issues/8

lshep commented 4 years ago

If you have a new version of the data you can use the same credentiald and upload the same as before. Just email or message me here that it's there and I can update it

eduardoillueca commented 4 years ago

Hi @hpages @lshep,

I am trying to change the metadata of the data to put this and to change GID to ENTREZID:

DBSCHEMA: MXANTHUS_DB
GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants
EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/
UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197 

When I changed the metadata, the columns of the database change and the OrgDb object is not compatible with clusterProfiler . This are the metadata and the columns of the actual data:

Now, I show the columns after changing the metadata:

OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: MXANTHUS_DB
| ORGANISM: Myxococcus xanthus DK 1622
| SPECIES: Myxococcus xanthus DK 1622
| CENTRALID: GID
| Taxonomy ID: 246197
| Db type: OrgDb
| Supporting package: AnnotationDbi
| GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants
| EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/
| UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197 
> columns(org.Mxanthus.eg.db)
 [1] "ACCNUM"   "ALIAS"    "ENTREZID" "EVIDENCE" "GENENAME" "GO"       "ONTOLOGY" "PMID"    
 [9] "REFSEQ"   "SYMBOL"   "UNIGENE" 

And now I show the columns of org.Hs.eg.db

> columns(org.Hs.eg.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "ENTREZID"    
 [7] "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"     "GO"           "GOALL"       
[13] "IPI"          "MAP"          "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"        
[19] "PFAM"         "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
[25] "UNIGENE"      "UNIPROT" 

The problem is that the GOALLand ONTOLOGYALL are mandatory and when I change the metadata they disappear (I don't know why). Other columns such as CHROMOSOME disappear and they could also be important. Thanks very much,

Eduardo Illueca

eduardoillueca commented 4 years ago

Sorry, I don't show the actual data and metadata:

OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Myxococcus xanthus
| SPECIES: Myxococcus xanthus
| CENTRALID: GID
| Taxonomy ID: 246197
| Db type: OrgDb
| Supporting package: AnnotationDbi
> columns(query(ah, c("Myxococcus xanthus DK 1622", "org.Mxanthus.eg.db"))[[1]])
downloading 0 resources
loading from cache
[1] "CHROMOSOME"  "EVIDENCE"    "EVIDENCEALL" "GID"         "GO"          "GOALL"      
[7] "ONTOLOGY"    "ONTOLOGYALL" "SYMBOL" 
bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

8dc7331 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

eduardoillueca commented 4 years ago

This is the last version of the packages. It has new metadata and new columns that allows to map with other IDs like Uniprot ID

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

7d54f62 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

eduardoillueca commented 4 years ago

I think I would change the name of the package to org.Mxanthus.gid.db. Do you think that this name is more aproppiate? Should I change the name in the git repository, the description file and also in the S3 data? Thanks very much,

Eduardo Illueca