Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

poplin #2476

Closed jaehyunjoo closed 2 years ago

jaehyunjoo commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @jaehyunjoo

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: poplin
Title: LC/MS Metabolomics Data Processing Utilities
Version: 0.99.0
Authors@R: c(
    person(given = "Jaehyun",
 family = "Joo",
 role = c("aut", "cre"),
 email = "jaehyunjoo@outlook.com"),
    person(given = "Blanca",
 family = "Himes",
 role = c("aut"),
 email = "bhimes@pennmedicine.upenn.edu")
 )
Description: Defines a S4 class for storing LC/MS metabolomics data processing
   results and provides utility functions for performing
   imputation, normalization, dimension reduction, and common
   visualizations.
License: GPL-3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
biocViews: Metabolomics, MassSpectrometry, DataRepresentation, Preprocessing, Normalization
Imports: 
    BiocGenerics,
    rlang,
    ggplot2,
    heatmaply,
    methods,
    S4Vectors,
    stats,
Suggests:
    limma,
    Biobase,
    hexbin,
    Rtsne,
    missForest,
    vsn,
    VIM,
    pcaMethods,
    pls,
    testthat (>= 3.0.0),
    BiocStyle,
    knitr,
    rmarkdown
Depends: 
    R (>= 4.1),
    SummarizedExperiment
Config/testthat/edition: 3
VignetteBuilder: knitr
bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/poplin to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lshep commented 2 years ago

@jaehyunjoo I apologize for the delay. I will have a review for you within the next week.

jaehyunjoo commented 2 years ago

@lshep No problem! Please take your time. I sincerely appreciate your time and help.

lshep commented 2 years ago

Please see initial review below:

Build Report

DESCRIPTION

NAMESPACE

README

vignette

> library(poplin)
> vignette("intro")
starting httpd help server ... done
Warning message:
vignette 'intro' found more than once,
using the one found in '/home/shepherd/R-Libraries/4.2-Bioc3.15/AzureGraph/doc' 

data / man pages

R code/man pages

I would like to defer comments on the code/class structure to one of our mass spectrometry/metabolomic developers. Please consider his comments are part of the formal review for comments.

@lgatto Could you please leave additional review comments.

lgatto commented 2 years ago

Dear @jaehyunjoo

Package name

The first two letters have a direct match with a German word for butt, but the pronunciation wouldn't lead to any confusion. However, I can't figure out how the name relates to metabolomics or any feature related to data analysis, which might make the package difficult to find for new users.

The DESCRIPTION file

The title and the description are a bit misleading, as the package focuses on quantitative metabolomics data processing and not raw LC/MS data at all. The functionality isn't specific to metabolomics; it is actually applicable to any quantitative omics data. As a result, I am not sure if the MassSpectrometry biocView is really relevant here.

Documentation

There are no Introduction nor any Installation sections. In general, there's very little text that provides more context - the code looks like a repetition of the man pages.

R code

> data(faahko_poplin)
> poplinData(faahko_poplin)
DataFrame with 206 rows and 2 columns
                            knn                  knn_cyclic
                       <matrix>                    <matrix>
1   1924712:1757151:1714582:... 20.4466:19.9522:20.0726:...
2    213659: 289501: 194604:... 17.3443:17.6197:17.2385:...
3    349011: 451864: 337473:... 18.0029:18.1785:17.9812:...
4    286221: 854341: 364300:... 17.6772:19.0195:18.0407:...
5   1160580:1018512:1345515:... 19.7946:19.1149:19.6982:...
...                         ...                         ...
202 116190: 441171: 88469.4:... 16.3391:17.9638:15.9276:...
203 210828: 880169:139328.8:... 17.2908:19.1672:16.7219:...
204 379543:1254097:206622.8:... 18.0925:19.5905:17.2340:...
205 575200: 166249:172773.6:... 18.6621:16.5454:16.8744:...
206 170253:3148507:165221.7:... 16.9727:20.9890:16.9570:...
> poplinData(faahko_poplin)[[1]] <- poplinData(faahko_poplin)[[1]][, 1:5]
> validObject(faahko_poplin)
[1] TRUE
> colData(faahko_poplin)
DataFrame with 12 rows and 2 columns
         sample_name sample_group
         <character>  <character>
ko15.CDF        ko15           KO
ko16.CDF        ko16           KO
ko18.CDF        ko18           KO
ko19.CDF        ko19           KO
ko21.CDF        ko21           KO
...              ...          ...
wt16.CDF        wt16           WT
wt18.CDF        wt18           WT
wt19.CDF        wt19           WT
wt21.CDF        wt21           WT
wt22.CDF        wt22           WT
> dim(poplinData(faahko_poplin)[[1]])
[1] 206   5

This inconsistency highlights the possibly need to define a proper validity methods. But simply, and preferably, storing these in the original SummarizedExperiment's assay slot would guard again this.

poplin_biplot <- function(x, ...) {
  UseMethod("poplin_biplot")
}

poplin_boxplot <- function(x, ...) {
  UseMethod("poplin_boxplot")
}

poplin_naplot <- function(x, ...) {
  UseMethod("poplin_naplot")
}

## and so on

Unacceptable files

I spotted file R/subset-methods.R.bak should be removed.

jaehyunjoo commented 2 years ago

@lshep and @lgatto, Thank you so much for your insight and help. I will try to address the comments and improve the package.

jaehyunjoo commented 2 years ago

@lshep

as @lgatto suggested, we definitely need to rename the package to something meaningful. I am wondering how to connect the new repository URL to this Github issue. Thanks a ton!

lshep commented 2 years ago

I think the best solution would be to open a new issue to make sure it gets added properly. When you submit it please let me know here and I can push it through the pre-review process and reassign myself to it.

jaehyunjoo commented 2 years ago

Got it! Thank you so much for all your help.

lshep commented 2 years ago

I am going to close this issue. If you decide to move forward with a Bioconductor submission Please tag/reference this original issue so we know it is a continuation of this review. Cheers,