Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

MsDataHub submission #2887

Closed lgatto closed 1 year ago

lgatto commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @lgatto

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: MsDataHub
Title: Mass Spectrometry Data on ExperimentHub
Version: 0.99.0
Authors@R:
    c(person(given = "Laurent",
   family = "Gatto",
   email = "laurent.gatto@uclouvain.be",
   role = c("aut", "cre"),
   comment = c(ORCID = "0000-0002-1520-2268")),
      person(given = "Johannes",
   family = "Rainer",
   email = "Johannes.Rainer@eurac.edu",
   role = "aut",
   comment = c(ORCID = "0000-0002-6977-7147")))
Description: The MsDataHub package uses the ExperimentHub
   infrastructure to distribute raw mass spectrometry data
   files, peptide spectrum matches or quantitative data from
   proteomics and metabolomics experiments.
License: Artistic-2.0
BugReports: https://github.com/RforMassSpectrometry/MsDataHub/issues
Imports:
    ExperimentHub,
    utils
Suggests:
    ExperimentHubData,
    DT,
    BiocStyle,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
biocViews: ExperimentHubSoftware,
 MassSpectrometry,
 Proteomics,
 Metabolomics
Encoding: UTF-8
VignetteBuilder: knitr
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Config/testthat/edition: 3
bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MsDataHub to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lgatto commented 1 year ago

Just a quick note about the warning:

* Checking vignette directory...
    * WARNING: Evaluate more vignette chunks.
        7 code chunks / 10 total = 70% percent unevaluated
        0 non-exec code chunks (e.g., '``` r')

This is a data package and the non-evaluated code chunks show how to download datasets without actually doing so to save disk time when first building the package/vignette and space thereafter (assuming the cache is persistent). I can of course change this if preferred.

lazappi commented 1 year ago

I'm happy to ask the core and see what they think but my feeling is the vignette should be executed like for a normal package (I had a quick look at a couple of data packages and that seems to be the case). The space concern is reasonable though so I would suggest trying to show whatever the smallest reasonable amount of data is.

vjcitn commented 1 year ago

What are the sizes of downloads needed to carry this out?

lgatto commented 1 year ago

For now, there's 484M of data files, detailed below.

cdf:
total 3.6M
-rw-r--r-- 1 lgatto lgatto 3.6M Nov 28 14:50 ko15.CDF

cptac:
total 30M
-rw-r--r-- 1 lgatto lgatto 8.5M Dec  1 12:53 cptac_a_b_c_peptides.txt
-rw-r--r-- 1 lgatto lgatto 6.9M Dec  1 12:53 cptac_a_b_peptides.txt
-rw-r--r-- 1 lgatto lgatto  15M Dec  1 12:53 cptac_peptides.txt

PXD000001:
total 310M
-rw-r--r-- 1 lgatto lgatto 9.4M Nov 29 09:32 TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid
-rw-r--r-- 1 lgatto lgatto 301M Nov 29 09:32 TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz

sciex:
total 35M
-rw-r--r-- 1 lgatto lgatto 17M Nov 28 14:51 20171016_POOL_POS_1_105-134.mzML
-rw-r--r-- 1 lgatto lgatto 18M Nov 28 14:51 20171016_POOL_POS_3_105-134.mzML

TripleTOF-SWATH:
total 107M
-rw-r--r-- 1 lgatto lgatto 39M Nov 27 18:29 PestMix1_DDA.mzML
-rw-r--r-- 1 lgatto lgatto 69M Nov 27 18:29 PestMix1_SWATH.mzML

There will be more - for example, some data from the recently added MsQuality package will also be added once this one makes it into Bioconductor. I would prefer to be consistent and introduce all datasets in the same way, either with code evaluation, or without.

vjcitn commented 1 year ago

Let's present this option set to the TAB on the next call February call I will set a seven minute timeframe to present the data and have a discussion

lgatto commented 1 year ago

@lazappi - I just found an issue with the package. One of the files has been uploaded with a typo, so that now there's an inconsistency between the filename and the alias in the documentation, which generates a warning:

 checking for missing documentation entries ... WARNING
Undocumented code objects:
  ‘TMT_Erwinia_1uLSike_Top10HCD_iso1l2_45stepped_60min_01-20141210.mzid’
All user-level objects in a package should have documentation entries.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.

(it should be TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid)

I am waiting to hear from Lori to see if this can be changed or it I should simple keep the filename with typo and update the man page accordingly.

There's another issue with this and another file (with a similar name):

> MsDataHub::TMT_Erwinia_1uLSike_Top10HCD_iso1l2_45stepped_60min_01-20141210.mzid()
Error: unexpected symbol in "MsDataHub::TMT_Erwinia_1uLSike_Top10HCD_iso1l2_45stepped_60min_01-20141210.mzid"
> MsDataHub::TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz()
Error: unexpected symbol in "MsDataHub::TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz"

which I suspect might be because of the long file/function names. I have asked Lori.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: adcf783a0b31f7602d80e98707cf6e11033b1951

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MsDataHub to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lgatto commented 1 year ago

@lazappi - some updates:

I'll keep you posted.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: a05bf55ee469f4f02a3b774835c700e6ea443109

lgatto commented 1 year ago

Hi @lazappi - The feedback from the TAB was to "evaluate as needed", without enforcing anything particular.

I have updated the vignette with evaluated code chunks and illustrate loading them with existing infrastructure. Let's see how things go in terms of timing when the vignette is evaluated the first time and data are downloaded and cached.

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MsDataHub to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lgatto commented 1 year ago

So the error here is about runnable man page examples, which essentially boils down to the code in the vignette. I am happy to duplicate the vignette's chunks in the man pages examples, but we might be running in circles here.

lazappi commented 1 year ago

I think this error is going to happen on the actual build machines which we don't want so there probably needs to be some kind of examples, either the same as in the vignette or some small dummy example as you see fit.

lshep commented 1 year ago

@lazappi The error wouldnt occur on the daily builders as we still don't run bioccheck across all existing packages. I think its okay to let it go.

lgatto commented 1 year ago

Thank you. May be it might be useful to reflect on the package requirements for data packages. I am not sure that executable code in all examples and vignettes make sense - there isn't much any smaller dummy example I can run other than loading the data. This will inevitably lead to duplication in the man pages and the vignette, which isn't very useful for the user.

Maybe that in such cases, it would be allowed to not have examples in the man pages but explicitly redirect the users to the vignette or, alternatively, have examples in the man page and use the vignette for a general overview of the data without examples (or longer ones that make use of the data... although these are generally present in the software package).

lazappi commented 1 year ago

Hi @lgatto

Thanks for submitting MsDataHub :tada:! Below is my review of your package. Please reply here if anything is unclear or needs any further explanation. There are only very minor comments other than a question about how the package works.

What next?

Please address the comments as best as you can. When you are ready for me to check the package again please reply to let me know with a summary of changes you have made or any other responses. You can use the "Quote reply" option in the ... menu on this comment to reply directly to my points below.

Luke

Review

Key: :rotating_light: Required :warning: Recommended :green_circle: Optional :question: Question

General package development

DESCRIPTION file

Documentation

Vignette

Man pages

lgatto commented 1 year ago

General package development

  • I haven't reviewed a data package before so I might be misunderstanding how it works but I would expect the package to contain functions for downloading each of the datasets. As far as I can see it just contains metadata of each dataset and you still need the original packages to get them. Is this correct of is there something I am missing?

Yes, the package defines a metadata table that describes the datasets, that have already been made available via ExperimentHub. The functions to download the data are created when the package is loaded. This is handled by ExperimentHub::createHubAccessors() that is called by .onLoad, defined in zzz.R: a function foodata() will be created for data with title foo; and the title is retrieved from the metadata.

No other packages are involved.

DESCRIPTION file

  • Consider adding a BugReports field

There is already a BugReports field that points to https://github.com/RforMassSpectrometry/MsDataHub/issues.

  • Consider adding a URL field

I have added a pkgdown file that is referred to in the URL field: https://rformassspectrometry.github.io/MsDataHub/

Documentation

Vignette

I don't think it's necessary to include the GitHub installation instructions, users will be able to install the release/devel version from Bioconductor after the package is accepted

I have removed the Github installation instructions. Generally, I do however find these important as some new features are first released on Github for alpha-testing before being pushed to devel. In the case of a data package, it's probably not relevant.

Man pages

  • Consider adding a package man page

There's a package man page, that also documents to MsDataHub() function: see ?MsDataHub.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: e0a1d0e22d3b112a1a27e505f4985d2d2275b15c

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/MsDataHub to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lgatto commented 1 year ago

As discussed above, the error is

* Checking man page documentation...
snapshotDate(): 2023-02-28
    * ERROR: At least 80% of man pages documenting exported objects
      must have runnable examples.

and can be ignored.

lazappi commented 1 year ago

Thanks. I am happy to approve this now as we have been told we can ignore that error 🎉. It should be in Bioconductor devel in a few days after it is picked up by the build system.

bioc-issue-bot commented 1 year ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lgatto commented 1 year ago

Thank you!

lshep commented 1 year ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/lgatto.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("MsDataHub"). The package 'landing page' will be created at

https://bioconductor.org/packages/MsDataHub

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.