Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

SUITOR #2515

Closed wheelerb closed 2 years ago

wheelerb commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @wheelerb

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SUITOR
Title: Selecting the number of mutational signatures through
        cross-validation
Version: 0.99.0
Date: 2022-01-24
Authors@R: c(person(given="DongHyuk", family="Lee", role="aut", email="dhyuklee@pusan.ac.kr"),
   person(given="Bin", family="Zhu", role="aut", email="bin.zhu@nih.gov"),
   person(given="Bill", family="Wheeler", role="cre", email="wheelerb@imsweb.com"))
Description: An unsupervised cross-validation method to select the optimal number of mutational signatures.
Depends: R (>= 3.5.0), doParallel, foreach, parallel, ggplot2
License: GPL-2
biocViews: Genetics, Software, SomaticMutation
Suggests: devtools, MutationalPatterns, RUnit, BiocManager,
        BiocGenerics
NeedsCompilation: yes
Packaged: 2022-01-24 15:17:58 UTC; wheelerwi
vjcitn commented 2 years ago

The vignette includes evaluated installation commands. These must be removed or disabled. There is a reference to a github repo. Installation commands related to the github repo must be disabled in the vignette. The vignette is taking a very long time to compute. Can you speed it up either by changing convergence parameters or starting from a "warmed" state?

wheelerb commented 2 years ago

OK, I will make some changes to the vignette.

wheelerb commented 2 years ago

I have changed the vignette to start from a warmed state, so it now runs much faster.

vjcitn commented 2 years ago

I don't think your changes were committed to the https://github.com/wheelerb/SUITOR.git

wheelerb commented 2 years ago

Thanks, they weren't. I just committed them.

vjcitn commented 2 years ago

OK. Seeing the compiled .Rnw was a blast from the past. Consider using Rmd ... it gives nice HTML ... we don't require it but it is worth getting under your belt eventually.

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: fa549d58cf9e90ae42d8af7abaa698d91000db0e

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: fcec5336776e4d171995b4aed6f64eb201148910

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 41bdbe6941b7855bf1a18d0b31fe5a3db1abfead

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 5d97abbae155a5b3a89c8a113590bbd47c623ff4

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

PeteHaitch commented 2 years ago

Thank you for your submission. I have been assigned to review SUITOR and will provide my review within 2 weeks.

Cheers, Pete

PeteHaitch commented 2 years ago

Hi @wheelerb,

Thank you for your submission to Bioconductor. Overall, the package is in good shape with clear code. For acceptance into Bioconductor, I have a few Required points, as well as some Recommended points, that I would ask you to first please address. Please add your point-by-point replies to the review in this thread.

Cheers, Pete

Required

Recommended

wheelerb commented 2 years ago

Thanks, I will get started on these.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 8b920de845d45303f985aee564f437be57c272f6

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 1d344e3aaea5574db45a4b2dc68019380dcdefa6

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 625b66bf4679f71612511f180a060b94e48c07b3

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: a1a6dd2d1e75a67b5fa04133b783e750f68b658d

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: cecdccef213e8e71998fe54af4dbe63581212d4f

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 84aa146140f76efe7796aef928f9313bef8202cc

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b0aae00b10fc6d80e0f9286d4e3b239e8e4c61dc

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

wheelerb commented 2 years ago

Hi Pete,

Thanks for reviewing. I have made several changes to the package and my comments are below.

Sincerely,

Bill

Required Packaged should not be appearing in DESCRIPTION (see https://cran.r-project.org/doc/manuals/r-release/R-exts.html#The-DESCRIPTION-file) Fixed

My Sweave is rusty; is it possible to use an 'unevaluated' code chunk in Sweave rather than relying on commented-out code? Fixed The vignette is now in Rmd format.

For the commented-out code in Sections 5-7 of vignette, I tested out the actual code to see how long it would take to run and obtained slightly different results. Please set.seed() to ensure that the results that a user who actually runs the code chunk would obtain by copying+pasting from the vignette match those from the pre-saved results. Fixed

In the vigentte, the plot_96_profile(Extract$W, condensed=TRUE, ymax=0.3) code does not actually generate a figure in the vignette. Fixed

Please take reasonable steps to address all output of running BiocCheck::BiocCheck() on the package. Fixed *One of the NOTES is: " Avoid redundant 'stop' and 'warn' in signal conditions" In functions that check input arguments, I have multiple stop() statements that will throw a different error message depending on what is wrong with the input argument. I do not see how they are redundant. Another NOTE is : "Usage of dontrun{} / donttest{} found in man page examples." These are needed in some examples, because the main functions can take a long time to run.**

doParallel, foreach, parallel, and ggplot2 should appear in Imports, rather than Depends, in the DESCRIPTION. Fixed

Recommended Consider using BiocStyle for formatting the vignette for consistency with other Bioc; see Section 1.1.1 of the vignette (pdf) for how to use BiocStyle with Sweave. Fixed The format of the vignette has now been changed to Rmd.

Consider using BiocParallel for parallelisation for the reasons mentioned in https://contributions.bioconductor.org/r-code.html?q=parallel#parallel-recommendations. No change yet. I have looked at the BiocParallel package, but it is unclear how my code would change in order to use BiocParallel.

Consider adding a README file. Added

Please add a BugReports field to the DESCRIPTION. This usually points to the GitHub Issues page, https://github.com/wheelerb/SUITOR/issues. Added

Please be consistent with use of camelCase or snake_case in naming (exported) functions. Fixed, suitor_extract_WH is now called suitorExtractWH

Consider adding a citation file inst/CITATION; see https://contributions.bioconductor.org/citation.html Added

I asked @hpages, a core member of the Bioconductor team, to take a look at the C code. His comments were, "C code in SUITOR could avoid risks of memory leak by using R_alloc() instead of malloc()/free() but otherwise is clean and well organized so memory management concerns are not a show-stopper (memory leaks would only occur in unusual situations when some functions don't complete because they run into an error).". To my understanding there's nothing in the C code that requires changing, but please consider his advice. Fixed I am now using R_Calloc and R_Free instead of malloc and free.

PeteHaitch commented 2 years ago

Hi @wheelerb,

Thanks for addressing the comments in the initial review. There are still some issues that must be addressed for SUITOR to be accepted. Please note that these need to be fixed by the end of this week to ensure that SUITOR is included in Biocondcutor 3.15 (see https://bioconductor.org/developers/release-schedule/).

Cheers, Pete

Required

library(SUITOR)
data(SimData)
OP <- list(min.rank=5, max.rank=13, k.fold=5, n.seeds=50, get.summary=0)
set.seed(123)
re2 <- suitor(SimData, op=OP)
#> Error in check_op_valid(op): ERROR: the option(s) n.seeds are not valid

Recommended

mtmorgan commented 2 years ago

Looking briefly at the code, I think you would remove all but the call to extractWH_par() from https://github.com/wheelerb/SUITOR/blob/c3ba8de42082e8b0826b420b14c35013fb0e5b6c/R/source_extract_WH.R#L40-L50 and replace https://github.com/wheelerb/SUITOR/blob/c3ba8de42082e8b0826b420b14c35013fb0e5b6c/R/source_extract_WH.R#L109-L119 with

res <- bplapply(seq_len(n), function(i, input, op, mat, a, b) {
    extractWH_seq_C(input, op, mat[a[i]:b[i], , drop=FALSE])
}, input, op, mat, a, b)

op$n.cores and op$n.type would be replaced by either SnowParam() or MulticoreParam(), or maybe default to bpparam() (which chooses MulticoreParam (forked) on non-Windows, SnowParam (socket) on Windows).

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c6c3b835aec9b600c32122e33697926ea608ac82

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

wheelerb commented 2 years ago

Hi Pete and Martin,

Thank you for your feedback and code. Below are the changes made to your points above.

Sincerely,

Bill

Required Please limit use of unevaluated code chunks in vignette where feasible. Ideally, all chunks (except installation) in the vignette would be evaluated (perhaps using fewer iterations or a smaller subset of data) but I appreciate that it may be challenging to make an example for the vignette that can complete within the time limits of Bioconductor's build system. If you judge it infeasible to shorten the running time of the examples, then please double-check all unevaluated chunks give the expected output. I strongly recommend that you make these checks automatic (rather than doing them by hand) by including them as 'long tests' (see https://bioconductor.org/developers/how-to/long-tests/). This will ensure that the code is regularly checked by the Bioconductor build machines but you could continue to load pre-computed results in the vignette to save time. The vignette has been modified to run all chunks instead of loading pre-computed results.

The chunk containing Extract <- suitorExtractWH(SimData, re$rank) does not take particularly long to run (40 seconds on my laptop). Please modify this section so that the code is evaluated rather than loading pre-computed results. Fixed

The vignette code in 'Running suitor() with the different option values' of the vignette contains an error. This chunk is not evaluated and hence this error isn't caught by the automatic checks. Please fix. Fixed

library(SUITOR) data(SimData) OP <- list(min.rank=5, max.rank=13, k.fold=5, n.seeds=50, get.summary=0) set.seed(123) re2 <- suitor(SimData, op=OP)

> Error in check_op_valid(op): ERROR: the option(s) n.seeds are not valid

Please take a look through the rendered vignette to check the formatting matches your expectations. Here are some obvious issues I spotted with quick fixes: Fix section numbering in vignette; these should start at 1 not 0 (Use # for top-level headings, ## for second-level headings, etc). There is no need for manual numbering of section headings because Rmarkdown will autogenerate these once you use the correct syntax (this Rmarkdown cheatsheet may be useful https://rmarkdown.rstudio.com/lesson-15.HTML, as too the linked documentation). Manual numbering is no longer done

Most Rmarkdown documents I've read use backticks for function names and arguments when writing Rmarkdown, e.g., suitor() and re$rank rather than bold (suitor()) or italic (suitor()). I would recommend following suit on this.

Backticks are now used for function and option names

Related, the formatting of the paragraph beginning with "By default, the suitor() function returns a list containing the estimate" is messed up. I think this is an unfortunate interaction of $ being used to denote LaTeX math mode as well as being valid R code. If you change from using bold (*) or italic () markdown syntax to using code (```) for the variables then this issue goes away.

Fixed

The chunks that show the output of str(re), str(re2), re$rank, head(Extract$W), and Extract$H[,1:3] are unclear when reading the rendered vignette because the actual calls to str(re), str(re2), re$rank, head(Extract$W), and Extract$H[,1:3] aren't shown in the rendered vignette (because the code is in a echo = FALSE and/or eval = FALSE chunk) and only their output is shown.

Fixed

For future reference, you could create table1 and table2 in the vignette as a data.frame and then use knitr::kable() to create a nicely rendered table.

knitr::kable() is now used to display tables

Recommended It is strongly recommend to keep the Bioconductor-hosted git repo (git@git.bioconductor.org:packages/SUITOR) in-sync with the GitHub-hosted git repo (https://github.com/wheelerb/SUITOR) (please see https://www.bioconductor.org/developers/how-to/git/ for documentation on this topic).

They are now in sync

Consider using suppressPackageStartupMessages(library(MutationalPatterns)) in the vignette to avoid the verbose output. Changed

"I have looked at the BiocParallel package, but it is unclear how my code would change in order to use BiocParallel." @mtmorgan, author of BiocParallel may be able to point you in a helpful direction. From what I can tell, you're currently using foreach for parallelisation, along with some custom code to detect Windows (PSOCK) vs. non-Windows machines (FORK) machines. I think BiocParallel should be able to help simplify your code by taking away some of the 'bookkeeping', but I won't make use of BiocParallel a requirement for acceptance of SUITOR because it seems you've made some effort to ensure portability of the code across machines. I have tried implementing BiocParallel, but it is not working yet. The problem is that there are R functions that call C functions, and the C functions are not visible to the workers. I got around this problem by using the option .packages="SUITOR" in the foreach() call, but I have not found a solution using BiocParallel yet.

mtmorgan commented 2 years ago

For BiocParallel this should 'just work'. For instance, Rsamtools::idxstatsBam() calls C code, and

> fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
> bplapply(rep(fl, 2), idxstatsBam, BPPARAM = SnowParam(2))
[[1]]
  seqnames seqlength mapped unmapped
1        *         0      0        0
2     seq1      1575   1482       19
3     seq2      1584   1789       17

[[2]]
  seqnames seqlength mapped unmapped
1        *         0      0        0
2     seq1      1575   1482       19
3     seq2      1584   1789       17

shows that the C code is available on the workers.

If you create a branch in your git repository with your attempt at using BiocParallel, and a simple reproducible example then I would be happy to look at it.

Alternatively, if you are using the devel version (I guess you are, since this is a submitted package) you could use

bplapply(..., BPOPTIONS = bpotions(packages = "SUITOR"))

but this should not be necessary.

PeteHaitch commented 2 years ago

Thanks @wheelerb. SUITOR is almost ready for acceptance, there are just 2 things to address:

As I mentioned, Bioconductor 3.15 is currently being released (https://bioconductor.org/developers/release-schedule/) and SUITOR will not be part of this release. However, once accepted, SUITOR will be available in the new devel version (Bioconductor 3.16) which will then form the next release (in roughly 6 months time).

Cheers, Pete

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 0f2da7bc9cbc592d37b7a0331f6b7b47d3c5d823

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

wheelerb commented 2 years ago

Thank you Martin and Pete. The problem was that when I develop R packages, I often just source the R files and shared object file instead of loading the package. The SUITOR package now uses BiocParallel. The vignette was also updated. Sincerely, Bill

PeteHaitch commented 2 years ago

Thanks, Bill. I've got grant deadlines this week but I will take a look next week.

Cheers, Pete

PeteHaitch commented 2 years ago
bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: fb6f1ae88aa17608612845342647c73592aafb49

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SUITOR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.