Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

(inactive) CAMPP2 #2999

Closed mtiberti closed 1 year ago

mtiberti commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @mtiberti

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: CAMPP2
Type: Package
Title: CAncer bioMarker Prediction Pipeline 2 (CAMPP2)
Version: 0.99.0
Authors@R: c(
    person("Tom, Nikola", email = "ntom@cancer.dk", role = c("aut")),
    person("Nourbakhsh, Mona", email = "mon@cancer.dk", role = c("aut")),
    person("Terkelsen, Thilde", role=c("aut")),
    person("Bechgaard, Mathilde Wismann", email = "mathilde@bechgaard.dk", role = c("aut")),
    person("Tiberti, Matteo", email = "tiberti@cancer.dk", role = c("cre", "aut")),
    person("Papaleo, Elena", email = "elenap@cancer.dk", role = c("aut")))
Description: The CAncer bioMarker Prediction Pipeline2 (CAMPP2) is a simple
  bioinformatics tool intended to automatize identification of potential
  diagnostic and prognostic cancer biomarkers. The pipeline is versatile
  and may be used for analysis of a variety of quantitative biological
  data from high throughput platforms, including genes, proteins, small
  RNAs, lipids and glycans. CAMPP2 currently supports:
  replacing missing values, data normalization, batch correction, 
  distributional check, dimensionality reduction, clustering, 
  differential expression/abundance analysis, LASSO/Elastic Net regression,
  random forest, and visualisations
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
BugReports: https://github.com/ELELAB/CAMPP2/issues
RoxygenNote: 7.2.3
biocViews: GeneExpression, DifferentialExpression, Annotation, Clustering, QualityControl, RNASeq, Metabolomics, SystemsBiology, Software, Transcription, Normalization, PrincipalComponent, Lipidomics, Visualization, Regression, DimensionReduction
Depends: R (>= 4.1.0)
Imports:
  ComplexHeatmap,
  FactoMineR,
  VennDiagram,
  biomaRt,
  caret,
  doMC,
  edgeR,
  factoextra,
  fitdistrplus,
  ggplot2,
  ggrepel,
  glmnet,
  grid,
  impute,
  limma,
  mclust,
  nnet,
  pROC,
  parallel,
  pcaMethods,
  randomForest,
  rio,
  rngtools,
  squash,
  statmod,
  sva,
  varSelRF,
  viridis,
  viridisLite,
  zeallot
exports:
VignetteBuilder: knitr
Suggests: BiocStyle, knitr, rmarkdown
vjcitn commented 1 year ago
--- re-building ‘vignette_campp2.Rmd’ using rmarkdown
Quitting from lines 261-262 (vignette_campp2.Rmd) 
Error: processing vignette 'vignette_campp2.Rmd' failed with diagnostics:
package or namespace load failed for 'CAMPP2':
 object 'predict' is not exported by 'namespace:randomForest'
--- failed re-building ‘vignette_campp2.Rmd’
mtiberti commented 1 year ago

thanks @vjcitn - I should have fixed that in the CAMPP2 master. I can't quite explain that before this change it built/checked correctly on my system (I'm using a slightly modified version of the BioC devel Docker image). After this change, I have redone R CMD build, R CMD check, BiocCheck::BiocCheck() and they all end without warnings or errors. I've also tried devtools::build_vignettes() for good measure.

Can you try again now? Or if it errors out again, if you can please tell me you are exactly doing so I can reproduce the problem. Thank you.

bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CAMPP2 to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

HelenaLC commented 1 year ago

Before getting into an in-depth review, I notice that the majority of functions operates on a data.frame of measurement data or abundances. Meanwhile, there is a lot of existing infrastructure for the types of analyses (e.g., dimension reduction, aggregation, clustering, etc.). Bioconductor places a strong emphasis on re-use of code and classes where it makes sense, which I think it does here. Namely, why not use the SummarizedExperiment at the center of it all?

mtiberti commented 1 year ago

thank you @HelenaLC ! Your point definitely makes sense, just a couple of comments:

  1. I'm just a bit dubious whether this would be as easy/straightforward to use for people not as much into coding, i.e. if they would need to create a SummarizedExperiment object starting from their data and then proceed with the analyses they need. This is especially as the package is designed to be used by people with relatively small experience in bioinformatics/coding and possibly without too much tweaking
  2. I'm a bit worried about the timeframe, since we're trying to get the package out with 3.17, it sound like this would require quite a bit of refactoring (+ regular code review of course)

Do you think it would make sense to do this switch after the package is reviewed and (hopefully) included in BioC or do you see this as a necessary step before further review?

Many thanks in either case!

vjcitn commented 1 year ago

"I'm just a bit dubious whether this would be as easy/straightforward to use for people not as much into coding" -- Hi @mtiberti ! We've been creating code used by all bioinformaticians worldwide for 20 years now. The value of self-describing, coherently linked data structures for assay+sample metadata (colData)+feature metadata (rowData and rowRanges) may not be apparent to folks who are not much into coding, and those folks are free to do what they like. But the value of an ecosystem with highly self-descriptive data structures is definitely compromised when packages come in that don't participate in the framework. So please consider how you and your users can take advantage of Bioconductor infrastructure, if you want to be reviewed and distributed in Bioconductor. Packages that operate on data frames can be managed and distributed by other avenues. Thanks for your submission!

lshep commented 1 year ago

In our experience if it doesn't happen in the review it doesn't happen.

SummaizedExperiment is fairly easy to construct and has many benefits that help naive users. The easiest example is filtering and subsetting. A user would have to match and filter each individual data.frame while subsetting a SummarizedExperiment object would do this for all data present on the object.

From that point of view too, when I worked through some of the vignette to try to see the data, I ended up with over a dozen data frame in my R session. This is overwhelming and unwieldy.

I also did not do a full in depth review but on a quick glance through the vignette some other issues:

mtiberti commented 1 year ago

got it - thank you for your insights @vjcitn and @lshep. We'll refactor the package to use Bioconductor data structures (such as SummarizedExperiments) and fix the other issues that you pointed out. We'll be back to you as soon as possible

lshep commented 1 year ago

How is the refactoring coming? May we expect updates soon or should we close this for now?

bioc-issue-bot commented 1 year ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for your interest in Bioconductor.