Bioconductor / Contributions

Contribute Packages to Bioconductor
131 stars 33 forks source link

Spectra package for representation and analysis of mass spectrometry data #1614

Closed jorainer closed 3 years ago

jorainer commented 3 years ago

Dear BioC core,

with this we (Laurent Gatto, Sebastian Gibb and myself) would like to submit the Spectra package for representation and analysis of mass spectrometry data to Bioconductor. I'd also like to clarify some issues thrown by BiocCheck:

We thank you in advance for revising and checking our package!

Best regards, Johannes

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 3 years ago

Hi @jorainer

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: Spectra
Title: Spectra Infrastructure for Mass Spectrometry Data
Version: 0.99.0
Description: The Spectra package defines an efficient infrastructure
   for storing and handling mass spectrometry spectra and functionality to
   subset, process, visualize and compare spectra data. It provides different
   implementations (backends) to store mass spectrometry data. These comprise
   backends tuned for fast data access and processing and backends for very
   large data sets ensuring a small memory footprint.
Authors@R: c(person(given = "RforMassSpectrometry Package Maintainer",
          email = "maintainer@rformassspectrometry.org",
          role = "cre"),
   person(given = "Laurent", family = "Gatto",
            email = "laurent.gatto@uclouvain.be",
            role = "aut",
            comment = c(ORCID = "0000-0002-1520-2268")),
         person(given = "Johannes", family = "Rainer",
            email = "Johannes.Rainer@eurac.edu",
            role = "aut",
            comment = c(ORCID = "0000-0002-6977-7147")),
         person(given = "Sebastian", family = "Gibb",
            email = "mail@sebastiangibb.de",
            role = "aut",
            comment = c(ORCID = "0000-0001-7406-4443")))
Depends:
    R (>= 4.0.0),
    S4Vectors,
    BiocParallel,
    ProtGenerics (>= 1.17.4)
Imports:
    methods,
    IRanges,
    MsCoreUtils (>= 1.1.3),
    BiocGenerics,
    graphics,
    grDevices,
    stats,
    tools,
    utils
Suggests:
    testthat,
    knitr (>= 1.1.0),
    msdata (>= 0.19.3),
    roxygen2,
    BiocStyle (>= 2.5.19),
    mzR (>= 2.19.6),
    rhdf5 (>= 2.32.0),
    rmarkdown,
    vdiffr,
    magrittr
License: Artistic-2.0
LazyData: yes
VignetteBuilder: knitr
BugReports: https://github.com/RforMassSpectrometry/Spectra/issues
URL: https://github.com/RforMassSpectrometry/Spectra
biocViews: Infrastructure, Proteomics, MassSpectrometry, Metabolomics
RoxygenNote: 7.1.1
Roxygen: list(markdown=TRUE)
Collate:
    'hidden_aliases.R'
    'AllGenerics.R'
    'MsBackend-functions.R'
    'MsBackend.R'
    'MsBackendDataFrame-functions.R'
    'MsBackendDataFrame.R'
    'MsBackendHdf5Peaks-functions.R'
    'MsBackendHdf5Peaks.R'
    'MsBackendMzR-functions.R'
    'MsBackendMzR.R'
    'ProcessingStep.R'
    'Spectra-functions.R'
    'Spectra.R'
    'functions-util.R'
    'peak-list-functions.R'
    'peaks-functions.R'
    'plotting-functions.R'
    'zzz.R'
bioc-issue-bot commented 3 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Spectra to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jorainer commented 3 years ago

Dear reviewer (@LiNk-NY ). The 3 errors from the BiocCheck were expected (as described in the initial message):

Please let me know if there is anything else we can do or clarify.

LiNk-NY commented 3 years ago

Hi Johannes, @jorainer

Can you add an example of using the virtual class in case other developers want to use it? For example (similar to Biobase):

# Create a new class
MyBE <- setClass("MyData", contains="MsBackend")
MyBE()

# virtual classes cannot be instantiated directly
try(new("MsBackend"))

Can you register the account on the support site with maybe a shared password?

Best, Marcel

lgatto commented 3 years ago

Hi @LiNk-NY - I am registered on the support site and so is @jorainer; we are also registered on the developer mailing list. As you can verify, we have been addressing user questions. Creating a new dummy account with the shared email address won't influence that, neither in a positive nor in a negative way.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b1f93aa6278c2f3d7f11b0077073279e54a39883

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Spectra to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jorainer commented 3 years ago

Hi Marcel @LiNk-NY , I've now added documentation also to the MsBackend.Rd. The only error that remains is the one regarding the maintainer email address - @lgatto replied to that issue above.

I hope this addresses all issues.

thanks

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Spectra to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Laurent @lgatto and Johannes @jorainer ,

Thank you for your submission to Bioconductor. Please see the review below and respond line by line.

Best regards, Marcel


Spectra #1614

DESCRIPTION

NAMESPACE

vignettes


* Perhaps using `filter(x, dataOrigin == "XYZ")` should be supported rather
than having a function for each type of filtering (e.g., `filterDataOrigin()`)?

## R

* Can the `asDataFrame` generic and methods be avoided by using the `setAs`
function?
* It seems to me that `asDataFrame<-` is an awkward construction.
* I'd encourage the use of `S4` method on an `S4` class (e.g., using
`as(x, "List")` when `x` is an `MsBackendHdf5Peaks` class rather than
`as.list` which looks like an S3 method but is created as an S4 method.
* Use the `@` accessor sparingly in package code.
* I'm not sure you should have methods for the virtual class in
`R/MsBackend.R` ... (@mtmorgan?)
* Why not let the user execute `bplapply` themselves rather than having
an `lapply` method wrapper?
* Coding and formatting look good.

## tests

* It looks like the is plenty of coverage with the tests.
jorainer commented 3 years ago

Thanks Marcel @LiNk-NY for the fast review! I'll start to answer to the first few points.

Spectra #1614

* Solid package with very expansive functionality.

* You may want to consider separating the plotting functionality from
  the infrasturcture package into its own package.

Indeed, we were thinking about that but agreed that base R plotting functionality (since it's just the 3 plots in total) could stay within this package. We were thinking of an additional package that provides ggplot-based plotting.

DESCRIPTION

* Looks good.

* Use `LazyData: false` if you have data that is on the larger side.

OK. We'll change to LazyData: false.

NAMESPACE

* Perhaps the implementation of `asDataFrame<-` needs revisiting

* Could the `replaceList<-` method work as `List<-` for the list-like element
  in the class?

I guess you refer to the name replaceList<- instead of using List<-? The reason for that was that we wanted to make it clear to the user that this method replaces the list of peak matrices within the object. With a List<- method, as a user, I would expect to have also a List,Spectra method (which we in fact don't have - the user should call as(x, "List") in that case. Thus, the replaceList<- is sort of the inverse function of as(x, "List").

* Why does `lapply` need to be exported?

We have a lapply,Spectra method implemented in the package - I thought that all methods that are implemented have to be also exported?

* Perhaps `lapply` could be inherited by only the list-like representation
  within the class?

The data of a Spectra is more than just the list of peak matrices. Parallel to the peak matrix list (which contains the m/z and intensity values for each spectrum) we have additional metadata for each spectrum (the so called spectra variables. Also, importantly, a Spectra contains data from one or x spectra and it is thus equivalent with a list of spectrum instances (each spectrum characterized by more than just the peak matrix). With the lapply,Spectra method we want to give the user the possibility to loop over each spectrum in Spectra and apply a function to all of its data, the peak matrix and the associated spectra variables. Just applying it to the list element, i.e. the list of peak matrices would not suffice.

jorainer commented 3 years ago

vignettes

* Include an `Installation` section that uses `BiocManager`.

The section was added with commit https://github.com/rformassspectrometry/Spectra/commit/73d088e8e168c5c2f9ada60f59c53c9c4d2ca98a

  • It looks like the show method print out from line 122 in the vignette is missing something:

Indeed. The show,Spectra method was always printing "Processing:" even if no data manipulations or subsetting occurred. We changed this in commit https://github.com/rformassspectrometry/Spectra/commit/ead22f33682b1a9a59426e13d7cbd437ba97dff2 .

> sps
MSn data (Spectra) with 3 spectra in a MsBackendDataFrame backend:
    msLevel     rtime scanIndex
  <integer> <numeric> <integer>
1         2        NA        NA
2         2        NA        NA
3         2        NA        NA
 ... 18 more variables/columns.
Processing:
* Perhaps using `filter(x, dataOrigin == "XYZ")` should be supported rather
  than having a function for each type of filtering (e.g., `filterDataOrigin()`)?

This was a design decision we made. We feel it more user friendly to have a dedicated filter function for each filtering process. Thus, autocompletion in RStudio or emacs or any other editor supporting that, will suggest the user all options/filter methods that are available. In addition, MSnbase provides the same filter methods and (future Spectra) users will be already familiar with this. Another reason was that the filter function/method is already implemented in various packages which could result in namespace conflicts.

jorainer commented 3 years ago

R

* Can the `asDataFrame` generic and methods be avoided by using the `setAs`
  function?

* It seems to me that `asDataFrame<-` is an awkward construction.

* I'd encourage the use of `S4` method on an `S4` class (e.g., using
  `as(x, "List")` when `x` is an `MsBackendHdf5Peaks` class rather than
  `as.list` which looks like an S3 method but is created as an S4 method.

You are absolutely right! We will change that (which will also settle your previous issue with replaceList<-). See here for more details. After the changes I will report back here.

* Use the `@` accessor sparingly in package code.

We tried to use it only where necessary, i.e. in functions that will be called x1000 of times - so we can squeeze a little more performance out of the code.

* I'm not sure you should have methods for the virtual class in
  `R/MsBackend.R` ... (@mtmorgan?)

With that we wanted to define the API for new MsBackend instances, i.e. developers of new backends (there are already several such backends) need to implement these methods. And the default would actually throw a user understandable error message (instead of method xxx not found for class yyy).

* Why not let the user execute `bplapply` themselves rather than having
  an `lapply` method wrapper?

* Coding and formatting look good.

:)

tests

* It looks like the is plenty of coverage with the tests.

Thanks - yes we tried to test everything - and are proud to have 97% coverage :)

Thanks again for the review and happy to hear your comments/feedback. Regarding the asDataFrame, as.list etc I will post again once I have implemented that and pushed to BioC.

thanks, jo

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b601e70240cb58a67dcca5d7ac9c0aabee74e8b1

jorainer commented 3 years ago

With the recent update we:

Thus, now we replied to all your comments. Happy to hear your feedback Marcel @LiNk-NY .

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Spectra to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Johannes, @jorainer

Thank you for your response and for making those changes. Here are a few additional comments. We should be able to proceed with acceptance after your response.

Thanks! -Marcel


I guess you refer to the name replaceList<- instead of using List<-? The reason for that was that we wanted to make it clear to the user that this method replaces the list of peak matrices within the object. With a List<- method, as a user, I would expect to have also a List,Spectra method (which we in fact don't have - the user should call as(x, "List") in that case. Thus, the replaceList<- is sort of the inverse function of as(x, "List").

I think this is a matter of making the interface a bit more clear. It seems to me that replaceList<- should be something like accessList(object) <- replacementValue (in pseudocode). Then as(object, "List" ) would be similar to accessList(object).

We have a lapply,Spectra method implemented in the package - I thought that all methods that are implemented have to be also exported?

The data of a Spectra is more than just the list of peak matrices. Parallel to the peak matrix list (which contains the m/z and intensity values for each spectrum) we have additional metadata for each spectrum (the so called spectra variables. Also, importantly, a Spectra contains data from one or x spectra and it is thus equivalent with a list of spectrum instances (each spectrum characterized by more than just the peak matrix). With the lapply,Spectra method we want to give the user the possibility to loop over each spectrum in Spectra and apply a function to all of its data, the peak matrix and the associated spectra variables. Just applying it to the list element, i.e. the list of peak matrices would not suffice.

I am not sure that implementing an lapply method is necessary. I would suggest not creating an lapply method but allowing users to use lapply on the extracted component of the class that is list-like (as something returned by for example accessList(object)). lapply should work out of the box based on the List / list inheritance. Reading it again, it may be an issue of naming. Perhaps you'd like to implement your own Spectrapply method, for example, that does the things outlined in your response. I would rather not confuse users familiar with lapply with unexpected behavior when using lapply,Spectra. Also, why call it lapply if the manipulation involves multiple lists? Why not mapply or Map? I'd lean to a more tailored function name to avoid that sort of confusion.

With that we wanted to define the API for new MsBackend instances, i.e. developers of new backends (there are already several such backends) need to implement these methods. And the default would actually throw a user understandable error message (instead of method xxx not found for class yyy).

I suppose it's a matter of personal preference but developers should be able to understand this :wink:

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 04b62e9e98bec09b23610f90c238a7d533082f81

jorainer commented 3 years ago

Hi Marcel @lgatto !

thanks for your very fast reply and excellent suggestion!

I am not sure that implementing an lapply method is necessary. I would suggest not creating an lapply method but allowing users to use lapply on the extracted component of the class that is list-like (as something returned by for example accessList(object)). lapply should work out of the box based on the List / list inheritance. Reading it again, it may be an issue of naming. Perhaps you'd like to implement your own Spectrapply method, for example, that does the things outlined in your response. I would rather not confuse users familiar with lapply with unexpected behavior when using lapply,Spectra. Also, why call it lapply if the manipulation involves multiple lists? Why not mapply or Map? I'd lean to a more tailored function name to avoid that sort of confusion.

Yes, you're absolutely right! We changed that now and provide a spectrapply,Spectra method instead of the lapply,Spectra.

With that we wanted to define the API for new MsBackend instances, i.e. developers of new backends (there are already several such backends) need to implement these methods. And the default would actually throw a user understandable error message (instead of method xxx not found for class yyy).

I suppose it's a matter of personal preference but developers should be able to understand this 😉

Yes, developers should understand, I agree. But it's also more for the user. If the developer of the backend forgot to implement a method (on purpose or not) the user would get a more user friendly message telling him that this method is not available for the backend instead of the R default: unable to find inherited method for function 'xxx' for signature 'yyy'.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/Spectra to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Johannes @jorainer and Laurent @lgatto,

I've had a look at the changes and they look good. Thanks for your submission to Bioconductor!

Best regards, Marcel

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

lgatto commented 3 years ago

Thank you @LiNk-NY - really great review :100:. Your suggestions were really spot on and substantially improved the package!

mtmorgan commented 3 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/jorainer.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("Spectra"). The package 'landing page' will be created at

https://bioconductor.org/packages/Spectra

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

lgatto commented 3 years ago

Would it also be possible to give me (https://github.com/lgatto.keys) write access to the package's Bioconductor git repository.

jorainer commented 3 years ago

Thanks @LiNk-NY for your fast and excellent review!

LiNk-NY commented 3 years ago

Would it also be possible to give me (https://github.com/lgatto.keys) write access to the package's Bioconductor git repository.

@mtmorgan ?

mtmorgan commented 3 years ago

actually, @nturaga ;)