Closed eduardoillueca closed 4 years ago
Hi @eduardoillueca
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: org.Mxanthus.eg.db
Title: Genome wide annotation for Myxococcus xanthus DK 1622
Description: Genome wide annotation for Myxococcus xanthus DK 1622, primarily based on mapping using Entrez Gene identifiers.
Version: 1.0.0
Author: Eduardo Illueca Fernandez
Maintainer: Eduardo Illueca Fernandez <eduardo.illueca@um.es>
Depends: R (>= 2.7.0), methods, AnnotationDbi (>= 1.46.0)
Suggests: DBI, annotate, RUnit
Imports: methods, AnnotationDbi
License: Artistic-2.0
organism: Myxococcus xanthus DK 1622
species: Myxococcus xanthus DK 1622
biocViews: AnnotationHub, OrgDb, AnnotationData
Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
I think @lshep will handle this package, but for technical reasons @hpages will remain as assignee for until another package is added to the review queue.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The package needs to get fixed so that it can install/build correctly.
Most notably the .onLoad function is not correct. There is no extdata
sqlite file anymore. This seems like all the old code that has not been updated.
You need to access the sqlite data from the hub. I would also recommend this not be done during an .onLoad
method but rather have a actually R function that the user will run to download and have access to the orgDb object. I would recommend having this seperate R file and actually removing zzz.r
completely.
in other words something like
getData <- function(){
ah <- AnnotationHub()
query(ah, c("Myxococcus xanthus DK 1622", "org.Mxanthus.eg.db"))[[1]
}
The download in this case returns directly the OrgDb.
Please also note the new code in org.Mxanthus.egBASE.Rd
isn't in an example section and should be.
Thanks @lshep for the quick answer. I have updated org.Mxanthus.egBASE.Rd and I have included the code in an example section. Also, I have created a getData function in a zzz.R file anda I have removed the old org.Mxanthus.eg_dbconn.Rd file and I have created getData.Rd tha includes the documentation of this new function.
Can you set up your webhook and do a version bump to make sure it can build/install on our system?
The library calls shouldn't be in the R script. Please remove and use the NAMESPACE and DESCRIPTION to depend/import the necessary packages.
Received a valid push; starting a build. Commits are:
79b00c0 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
db17338 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
339a2aa Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
DESCRIPTION
man
R code
AnnotationHub::AnnotationHub()
BiocCheck
Thanks @lshep. I have updated the description file so I have put AnnotationHub::AnnotationHub() in depends and I import all the trhee packages in the description file (I have removed their from NAMESPACE). Also, I have signed up in the mailing list and support site.
Received a valid push; starting a build. Commits are:
59e52c0 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Ah I mean AnnotationHub in Depends and/or do AnnotationHub::AnnotationHub() in the R code of the zzz.R file
Sorry for the confustion on that
No problem. I will try this one
Received a valid push; starting a build. Commits are:
32d7852 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Please add import(AnnotationHub)
to the NAMESPACE - getting closer
Received a valid push; starting a build. Commits are:
9a8b51a Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
I think you original had AnnotationDbi and methods in the NAMESPACE as imports - was there a reason you removed them in this commit? I think adding import("AnnotationDbi")
and import("methods")
in the NAMESPACE should fix the last two major NAMESPACE warnings in the check. The package still shows ERROR but for the version number that I think is okay to ignore since this is an annotation package.
I think after you add the two into the NAMESPACE it will be ready to accept
Yes, the original had AnnotationDbi and methods in the NAMESPACE as imports. I will return to put this imports in NAMESPACE.
Received a valid push; starting a build. Commits are:
ce6abd7 Update DESCRIPTION
Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.
Thank you for contributing to Bioconductor!
@mtmorgan reminder this is an annotation package. @dvantwisk this will probably have to be cloned and added manually to the builder.
Thanks very much for all your support, @lshep. I am at your disposal for any problems that may come up. Thanks very much again.
Thanks @eduardoillueca for your submission.
@lshep @mtmorgan I don't think we should accept this so quickly, without at least taking the time to consider a few important aspects of AnnotationHub-based data packages.
For example, in the present case, org.Mxanthus.eg.db looks like it belongs to the family of OrgDb packages (like org.Hs.eg.db). However OrgDb packages are expected to present a certain API so they can be used in places where an OrgDb package is expected. Right now, AFAIK, all OrgDb packages support the "old" map-based API:
library(org.Hs.eg.db)
org.Hs.egSYMBOL[["26074"]]
# [1] "CFAP61"
revmap(org.Hs.egALIAS2EG)[["26074"]]
# [1] "C20orf26" "CaM-IP3" "dJ1002M8.3" "dJ1178H5.4" "CFAP61"
and the "new" select API:
select(org.Hs.eg.db, keys="26074", columns=c("SYMBOL", "ALIAS"))
# ENTREZID SYMBOL ALIAS
# 1 26074 CFAP61 C20orf26
# 2 26074 CFAP61 CaM-IP3
# 3 26074 CFAP61 dJ1002M8.3
# 4 26074 CFAP61 dJ1178H5.4
# 5 26074 CFAP61 CFAP61
I would strongly suggest that org.Mxanthus.eg.db supports at least the select API. That is, one should be able to do something like:
library(org.Mxanthus.eg.db)
columns(org.Mxanthus.eg.db)
select(org.Mxanthus.eg.db, keys="7179", columns="SYMBOL")
out-of-the-box i.e. without having to do org.Mxanthus.eg.db <- org.Mxanthus.eg.db::getData()
first.
It seems that this could be achieved via the use of an active binding here. See ?makeActiveBinding
.
By doing this, you'll no longer need to export getData()
. Plus the fact that the data will be fetched on-the-fly from AnnotationHub the first time org.Mxanthus.eg.db
is accessed should be transparent to the end user.
This goes beyond the org.Mxanthus.eg.db
case. It's probably worth taking the time to carefully think about what we want our AnnotationHub-based data packages to look like. With a strong focus on preserving backward compatibility. Ideally we should try to come up with some guidelines. Will benefit future migrations e.g. other OrgDb packages but also ChipDb, TxDb, BSgenome, etc...
I see a couple of other problems that are specific to org.Mxanthus.eg.db:
The OrgDb object returned by getData()
doesn't seem to adhere to the established conventions of org.*.eg.db
packages where the central column is expected to be ENTREZID
:
org.Mxanthus.eg.db <- org.Mxanthus.eg.db::getData()
columns(org.Mxanthus.eg.db)
# [1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GID" "GO"
# [6] "GOALL" "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"
No ENTREZID
column!
The show()
method returns very little information:
> org.Mxanthus.eg.db
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Myxococcus xanthus
| SPECIES: Myxococcus xanthus
| CENTRALID: GID
| Taxonomy ID: 246197
| Db type: OrgDb
| Supporting package: AnnotationDbi
Please see: help('select') for usage information
Compare with show(org.Hs.eg.db)
. It seems that maybe the EGSOURCE*
and GOSOURCE*
fields would be relevant for org.Mxanthus.eg.db
?
Please choose a better name for the DBSCHEMA
.
Best, H.
Thanks @hpages for your comments. I have found it really interesting so I have study carefully all the points. I have thought about possible solutions to the problems that you have noted and I would like to share them with you before updating the package.
`
columns(org.Mxanthus.eg.db) [1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GID" "GO" "GOALL"
[7] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"select(org.Mxanthus.eg.db, keys="7179", columns="SYMBOL") 'select()' returned 1:1 mapping between keys and columns GID SYMBOL 1 7179 MXAN_RS04930
` And also more complex querys
`
select(org.Mxanthus.eg.db, keys="2000", columns=c("SYMBOL", "GO")) 'select()' returned 1:many mapping between keys and columns GID SYMBOL GO 1 2000 MXAN_RS14035 GO:0000166 2 2000 MXAN_RS14035 GO:0004222 3 2000 MXAN_RS14035 GO:0005524 4 2000 MXAN_RS14035 GO:0006508 5 2000 MXAN_RS14035 GO:0008233 6 2000 MXAN_RS14035 GO:0008237
`
Later, when I deleted the .sqlite data from the package I had to change the R script, delete the .onLoad method and create the getData function, that recovers the database from AnnotationHub. I have taken a look to the orgHs.eg.db package and I have seen that it exist a org.Hs.eg.sqlite file with the data and it It is much busier than org.Mxanthus.eg.sqlite. So, I think there are two potential solutions: the first is to return to the original architecture of the package (with the data) and the metadata.csv, make-metadata.R and make-data.R. The other solution is to modify the zzz.R file of the new script and use makeActiveBinding
to solve this problem and support this querys without do org.Mxanthus.eg.db <- org.Mxanthus.eg.db::getData()
previusly. I will adopt the solution that you advise me.
ENTREZID: I know that the most important column in a OrgDb package is ENTREZID and it exist in org.Mxanthus.eg.db. Is the GID column. Is exactly the same as ENTREZID and has the same i function in my package. You could check this in the querys that I show before. There is a reason I have called GID this column. To create the package, I have used the AnnotationForge::makeOrgPackage
function. If you read the manual provided by AnnotationForge::makeOrgPackage
, you will check that it is mandatory to call this column as GID. In fact, there is an error if you try it with ENTREZID I do not know if is so important the name of the column. In this case, I do not know how to change this in the package. I have to study it.
SHOW() METHOD: I think I could solve this adding and changing the following:
DBSCHEMA: MXANTHUS_DB GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/ UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197
What do you think about this? Thanks very much again for your reviews and suggestions. Of course, Of course, I am willing to work hard these days and try to publish this package in version 3.10.
Yours sincerely,
Eduardo Illueca
Received a valid push; starting a build. Commits are:
81f8c5e Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi @lshep @hpages. I think I have fixed the first problem that @hpages told me. The error that I obtain is the error of the version. Now, when I install my package I could do this without do org.Mxanthus.eg.db <- org.Mxanthus.eg.db::getData()
:
columns(org.Mxanthus.eg.db)
[1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GID" "GO" "GOALL"
[7] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"
select(org.Mxanthus.eg.db, keys="7179", columns="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
GID SYMBOL
1 7179 MXAN_RS04930
It is also compatible with others packages like clusterProfiler
. To solve this I have updated this code in the zzz.R.
org.Mxanthus.egORGANISM <- "Myxococcus xanthus DK 1622"
getData <- function(){
ah <- AnnotationHub::AnnotationHub()
return(query(ah, c("Myxococcus xanthus DK 1622", "org.Mxanthus.eg.db"))[[1]])
}
.onLoad <- function(libname, pkgname){
makeActiveBinding("org.Mxanthus.eg.db", getData , .GlobalEnv)
}
So, the next task is to add the following:
DBSCHEMA: MXANTHUS_DB
GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants
EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/
UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197
I think that it is necessary to upload again the data to S3 to change it. In adittion, the column name is still GID for the reason explain before. If I could change it, it will be necessary to upload again the data to S3. In adition, I have thought to write a little manual to explain how to use this package with clusterProfiler
and add some example.
Thanks Eduardo for getting back to us and for your work on org.Mxanthus.eg.db.
I'm glad you figured out how to use makeActiveBinding
to delay the creation of the org.Mxanthus.eg.db
object. You are pioneering a little bit here (I don't think we have any AnnotationHub-based OrgDb package yet). FYI we expect to see more AnnotationHub-based packages in the future.
I'm surprised AnnotationForge::makeOrgPackage()
wants you to call the central column GID
. My understanding is that AnnotationForge::makeOrgPackage()
is used to produce most (if not all) of the OrgDb packages that we distribute. AFAIK all the Entrez-based ones (org.*.eg.db
) use ENTREZID
, not GID
. I think this is a case where bringing the issue on GitHub (https://github.com/Bioconductor/AnnotationForge/issues) or on the bioc-devel mailing list would have made a lot of sense.
Thanks again for your great work on this!
H.
Thanks @hpages for your responses.
I think the wisest option is to open a new issue in [https://github.com/Bioconductor/AnnotationForge/issues](). I hope that they could be solve in these days to upload the package with the ENTREZID terminology. At the moment, I will stay with the GID column. On the other hand, I don't know if I need the support of @lshep to change the data that I have stored in S3.
Yours sincerely,
Eduardo Illueca
This is the link to the issue:
If you have a new version of the data you can use the same credentiald and upload the same as before. Just email or message me here that it's there and I can update it
Hi @hpages @lshep,
I am trying to change the metadata of the data to put this and to change GID to ENTREZID:
DBSCHEMA: MXANTHUS_DB
GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants
EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/
UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197
When I changed the metadata, the columns of the database change and the OrgDb object is not compatible with clusterProfiler
. This are the metadata and the columns of the actual data:
Now, I show the columns after changing the metadata:
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: MXANTHUS_DB
| ORGANISM: Myxococcus xanthus DK 1622
| SPECIES: Myxococcus xanthus DK 1622
| CENTRALID: GID
| Taxonomy ID: 246197
| Db type: OrgDb
| Supporting package: AnnotationDbi
| GOSOURCE: https://www.ebi.ac.uk/QuickGO/annotations?taxonId=246197&taxonUsage=descendants
| EGSOURCE: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Myxococcus_xanthus/latest_assembly_versions/
| UPSOURCE: https://www.uniprot.org/uniprot/?query=taxonomy:246197
> columns(org.Mxanthus.eg.db)
[1] "ACCNUM" "ALIAS" "ENTREZID" "EVIDENCE" "GENENAME" "GO" "ONTOLOGY" "PMID"
[9] "REFSEQ" "SYMBOL" "UNIGENE"
And now I show the columns of org.Hs.eg.db
> columns(org.Hs.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID"
[7] "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GO" "GOALL"
[13] "IPI" "MAP" "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH"
[19] "PFAM" "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[25] "UNIGENE" "UNIPROT"
The problem is that the GOALL
and ONTOLOGYALL
are mandatory and when I change the metadata they disappear (I don't know why). Other columns such as CHROMOSOME
disappear and they could also be important. Thanks very much,
Eduardo Illueca
Sorry, I don't show the actual data and metadata:
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Myxococcus xanthus
| SPECIES: Myxococcus xanthus
| CENTRALID: GID
| Taxonomy ID: 246197
| Db type: OrgDb
| Supporting package: AnnotationDbi
> columns(query(ah, c("Myxococcus xanthus DK 1622", "org.Mxanthus.eg.db"))[[1]])
downloading 0 resources
loading from cache
[1] "CHROMOSOME" "EVIDENCE" "EVIDENCEALL" "GID" "GO" "GOALL"
[7] "ONTOLOGY" "ONTOLOGYALL" "SYMBOL"
Received a valid push; starting a build. Commits are:
8dc7331 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
This is the last version of the packages. It has new metadata and new columns that allows to map with other IDs like Uniprot ID
Received a valid push; starting a build. Commits are:
7d54f62 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
I think I would change the name of the package to org.Mxanthus.gid.db. Do you think that this name is more aproppiate? Should I change the name in the git repository, the description file and also in the S3 data? Thanks very much,
Eduardo Illueca
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.