Bioconductor / Contributions

Contribute Packages to Bioconductor
131 stars 33 forks source link

MS2extract #3398

Open DanielQuiroz97 opened 2 months ago

DanielQuiroz97 commented 2 months ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 months ago

Hi @DanielQuiroz97

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: MS2extract
Type: Package
Title: Targeted MS/MS Spectra Extraction for In-house Compound Library Creation
Version: 0.99.0
Date: 2023-10-13
Authors@R: c(
    person("Cristian", "Quiroz-Moreno",
 email = "cristianquirozd1997@gmail.com",
 role = c("aut", "cre"),
 comment = c(ORCID = "0000-0002-9069-9147") ),
    person("Jessica", "Cooperstone",
 email = "cooperstone.1@osu.edu",
 role = c("aut"),
 comment = c(ORCID = "0000-0001-7920-0088") )
 )
Maintainer: Cristian Quiroz-Moreno <cristianquirozd1997@gmail.com>
Description: A mass spectrometry toolbox with the objective of importing mass
   spectrometry data, and extracts the desired MS/MS spectra for in-house MS/MS
   compound library creation. Subsequently, users can export the spectra 
   in a .msp file format, as a resulting compound database for further use
   in compound identification tasks. It is worth to note that this package
   do not annotate mass spectrometry data.
License: MIT + file LICENSE
LazyData: true
RdMacros: Rdpack
biocViews: MassSpectrometry, Metabolomics
Depends:
     R (>= 4.2.0)
RoxygenNote: 7.2.3
Roxygen: list(markdown = TRUE)
Encoding: UTF-8
Imports: 
    cli,
    Rdpack,
    crayon,
    dplyr,
    ggplot2,
    ggpubr,
    ggrepel,
    magrittr,
    MSnbase,
    OrgMassSpecR,
    ProtGenerics,
    purrr,
    Rdisop,
    rlang,
    readxl,
    readr,
    ggsci
Suggests:
    knitr,
    rmarkdown,
    prettydoc,
    testthat (>= 3.0.0),
    Rcpp (>= 1.0.9),
    BiocStyle,
    RefManageR,
    sessioninfo
VignetteBuilder: knitr
Config/testthat/edition: 3
BugReports: https://github.com/CooperstoneLab/MS2extract/issues
URL: https://cooperstonelab.github.io/MS2extract/
lgatto commented 2 days ago

Thank you for your submission. I have some general comments, before addressing the technical review of the package.

One important aspect we take into account when considering new packages is how they make use of existing data infrastructure. I see that you used MSnbase::readMsData() to parse the raw (mzML, mzXML) data using (by default) the in-memory backend.

  1. Are you aware of the R for Mass Spectrometry packages (RforMS), and in particular the Bioconductor Spectra package? Spectra, and the other RforMS packages represent are developed by the core MSnbase team, re-using what worked well in MSnbase, and replacing/improving what didn't work that well. MSnbase isn't going to be deprecated any time soon, but I would suggest to use Spectra::Spectra() rather than MSnbase::readMsData().

  2. I see that you are exporting data to mgf and msp. This functionality is available as part of RforMS, and more specifically the MsBackendMgf and MsBackendMsp backends, that integrate directly with Spectra.

  3. Will you approach scale to larger dataset? Using an in-memory approach and converting the data into a tibble won't work for modern shotgun data. The small example data you provide is only composed of 9 MS2 scans, and generates a tibble with 24249 rows. Have you considered using our dedicated classes that would (1) make it possible to handle larg data and (2) benefit from the existing input and output capabilities (see point 2).

  library(Spectra)
  ProcA2_file <-
      system.file("extdata",
                  "ProcyanidinA2_neg_20eV.mzXML",
                  package = "MS2extract")
  ## load the raw data, extract based on retention time and precursor
  ## MZ, ...
  Spectra(ProcA2_file) |>
      filterRt(c(163, 180)) |>
      filterPrecursorMzRange(c(575.119, 575.120)) ## |> ...
  1. As a side note, using the RforMS infrastructure will provide you with lazy and parallel evaluation for free.