Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

HiLDA #1141

Closed zhiiiyang closed 5 years ago

zhiiiyang commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @zhiiiyang

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: HiLDA
Type: Package
Title: Conducting statistical inference on comparing the mutational exposures of 
    mutational signatures by using hierarchical latent Dirichlet allocation 
Depends: R(>= 3.6), ggplot2
Imports: R2jags, abind, cowplot, grid, forcats, stringr,
         GenomicRanges, S4Vectors, XVector, Biostrings, GenomicFeatures,
         BSgenome.Hsapiens.UCSC.hg19, BiocGenerics, tidyr, grDevices, stats,
         TxDb.Hsapiens.UCSC.hg19.knownGene, utils
Suggests: knitr, rmarkdown, testthat, BiocStyle
Version: 0.1.0
Date: 2019-06-04
Authors@R: person("Zhi", "Yang", email = "zyang895@gmail.com", 
    role = c("aut", "cre"))
Description: A package built under the Bayesian framework of applying 
    hierarchical latent Dirichlet allocation to statistically test whether the 
    mutational exposures of mutational signatures (Shiraishi-model signatures) 
    are different between two groups.
License: GPL-3
URL: https://github.com/USCbiostats/HiLDA,
     https://doi.org/10.1101/577452
BugReports: https://github.com/USCbiostats/HiLDA/issues
SystemRequirements: JAGS 4.3.0
biocViews: Software, SomaticMutation, Sequencing, StatisticalMethod, Bayesian
RoxygenNote: 6.1.1
VignetteBuilder: knitr

Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.

bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 5 years ago

R requires all dependencies to be in CRAN or Bioconductor. The package pmsignature is not on either. You will need to either change your package to not require this dependency or encourage pmsignature to submit to CRAN or Bioconductor.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2f9326a remove pmsignature

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

4cbfc57 remove git track for .DS_Store

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

d91783e remove .Rproj

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

lshep commented 5 years ago

Thank you for you interest in Bioconductor. Please see the initial review below:

DESCRIPTION

build report NOTES

inst

INSTALL

vignette

R code

Would probably benefit from doing the conditionals outside the full function
calls. This also would better clarify the difference in running code. 

ie.  

model.file <- ifelse(inputG@transcriptionDirection, system.file("local_tr.txt",package="HiLDA"), system.file("local.txt", package="HiLDA")) param <- ifelse(inputG@transcriptionDirection, varSaveTr, varSave)

modelFit <- ifelse(is.null(useInits), R2jags::jags(model.file=model.file, data=jdata, parameters.to.save=param, n.chains=2, n.iter=nIter, b.burnin=nBurnin), R2jags::jags(model.file=model.file, data=jdata, parameters.to.save=param, n.chains=2, n.iter=nIter, b.burnin=nBurnin, inits=inits))

I think I left out a difference in inits vs inits_tr,  but this should illustrate the point. 
Do this sort of minimialization to all scripts/functions

- [ ] Apply argument validation to all exported functions to make sure input is correct

- [ ] In read_file/hildaReadMPFile again it seems like the code controlled with   `if(length(removeInd) > 0)` is repeated.  Turning this into a little helper function would be better than repeating the same code chunks. And similarly in getMutationFeatureVector, You should loop over the baseList
  rather than repeating the following
  mutFeatures[which(XVector::subseq(context, start=baseInd, end=baseInd)
                    == "C"), 1] <-
    mutFeatures[which(XVector::subseq(context, start=baseInd, end=baseInd)
                      == "C"), 1] + tempDigits * 1

- [ ] And again in vis_signature seems like there is a lot of very similar repeated code btween flanking bases, alternatives bases from C and alternative bases from T that could potentially be combined into a single helper function. 

This biggest two issues are to limit repeated code and have general code clean up to make it more manageable and maintainable  and to better document the files and data you include in your package. 
As a general comment We would really recommend reaching out to pmsignature to submit their
package to CRAN or Bioconductor.  Enabling processing of a package that isn't readily downloadable or being actively maintained seems rather futile but will not prevent acceptance into Bioconductor. 

When you have addressed the above issues please do a version bump to trigger a new build and ensure all ERROR/Warnings/Notes are addressed.  Please also comment back here with how the above has been addressed and to request a re-review.  

I look forward to getting your package accepted into Bioconductor. 
bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2bf5e16 new version 0.99.3

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

a988e46 add methods is to the importFrom

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2a5b46b version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

5f89291 version bump 0.99.6

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

14754a3 version bump 0.99.7

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

6203a5c version bump 0.99.8

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

zhiiiyang commented 5 years ago

Thank you for you interest in Bioconductor. Please see the initial review below:

DESCRIPTION

  • [X] For what its worth the builders are using JAGS 4.2.0 not 4.3.0 and it appears to build and check fine. Do you insist on the 4.3.0 requirement or are older version sufficient? Perhaps lowering the constraint in the description? It has been changed to 4.2.0

build report NOTES

  • [X] Fix Citation location NOTE. remove the .md extension. See also Citation
  • [X] Fix note about methods is used but not declared there is a suggestion on how to fix it later in the check
  • [X] Fix global variable and function notes. ?globalVariables
  • [X] Consider adding runnable example to getMutationFeatureVector
  • [X] Please move the NEWS.md file to the top level directory instead of inst. News files as .md format can only be recognized in the top level directory as documented in ?news Everything above has been fixed.

inst

  • [X] Documentation on what/how the .txt files where created and why they are included
  • [X] inst/scripts should include scripts/information on how the data in inst/extdata was generated and how it is utilized. inst/JAGSModel is added to address the 1st problem. inst/script/INFORMATION is added to address the 2nd problem.

INSTALL

  • [X] Include an INSTALL file that gives details about installing JAGS. INSTALL is added.

vignette

  • [X] Change the name of the vignette to something meaningful. my-vignette is too generic. We recommend the name of the package as it is very intuitive for users to do vignette("HiLDA") rather than some obscure name. They would have to search for the correct vignette title to pull up your vignette The name has been changed to HiLDA.

  • [X] include bioconductor installation instructions It has been added.

  • [ ] Could The MutationFeatureData class better leverage Bioconductor objects. Like a SummarizedExperiment for mutationPosition? I am afraid no in this case.

  • [X] I don't understand the difference of 4.3.1 and 4.3.2. There is no description of what the significance is of including a "initial value" and there is no description of what file/object you are loading to include as an argument. I've included the difference of including the initial values, in which the MCMC sampling can converge faster.

  • [X] is there some random factor in the backend? I get different values when running hildaRhat? If this is the case it might be good to set a seed in a vignette so the numbers are reproducible A seed has been added to ensure reproducible results.

  • [X] 4.3.4 needs some description. What are these signatures. what are they used for? You also already loaded this file in the section 4.3.2 so it should not need to be reloaded. Reloading has been removed and some descriptions have been added. R code

  • [X] hilda_global and hilda_local should be one function. Large chunks of repeated code should always be avoided to minimize maintenance and debugging. These two functions are extremely similar and should be combined into one accounting for the minimal differences. Now these two functions have been merged into one function, hilda_test.

  • [X] The above function would probably benefit from some argument validity checks to make sure the right objects/specifications of arguments are passed. I've added some functions to ensure the right objs of arguments are passed by using if function which generates warnings.

  • [X] Minimizing code should be applied wherever possible. So the ending section

    if (is.null(useInits) == FALSE) {
        ## which means using inits from Param
        if (inputG@transcriptionDirection == TRUE) {
            modelFit <- R2jags::jags(model.file=system.file("local_tr.txt",
                package="HiLDA"), data=jdata, parameters.to.save=varSaveTr,
                inits=inits_tr, n.chains=2, n.iter=nIter, n.burnin=nBurnin)
        } else {
            modelFit <- R2jags::jags(model.file=system.file("local.txt",
                package="HiLDA"), data=jdata, parameters.to.save=varSave,
                inits=inits, n.chains=2, n.iter=nIter, n.burnin=nBurnin)
        }
    } else {
        if (inputG@transcriptionDirection == TRUE) {
            modelFit <- R2jags::jags(model.file=system.file("local_tr.txt",
                package="HiLDA"), data=jdata, parameters.to.save=varSaveTr,
                n.chains=2, n.iter=nIter, n.burnin=nBurnin)
        } else {
            modelFit <- R2jags::jags(model.file=system.file("local.txt",
                package="HiLDA"), data=jdata, parameters.to.save=varSave,
                n.chains=2, n.iter=nIter, n.burnin=nBurnin)
        }
    }

The code has been adjusted to avoid repeating problems. Would probably benefit from doing the conditionals outside the full function calls. This also would better clarify the difference in running code.

ie.

   model.file <-  ifelse(inputG@transcriptionDirection,
                         system.file("local_tr.txt",package="HiLDA"),
                         system.file("local.txt", package="HiLDA"))
   param  <-  ifelse(inputG@transcriptionDirection, 
                     varSaveTr,  varSave)

  modelFit <- ifelse(is.null(useInits),
                     R2jags::jags(model.file=model.file, data=jdata,
                     parameters.to.save=param, n.chains=2, n.iter=nIter,
                     b.burnin=nBurnin), 
                     R2jags::jags(model.file=model.file, data=jdata,
                     parameters.to.save=param, n.chains=2, n.iter=nIter,
                     b.burnin=nBurnin, inits=inits)) 

I think I left out a difference in inits vs inits_tr, but this should illustrate the point. Do this sort of minimialization to all scripts/functions

  • [X] Apply argument validation to all exported functions to make sure input is correct I added if functions and warnings functions to ensure the correct format for inputs.
  • [X] In read_file/hildaReadMPFile again it seems like the code controlled with if(length(removeInd) > 0) is repeated. Turning this into a little helper function would be better than repeating the same code chunks. And similarly in getMutationFeatureVector, You should loop over the baseList rather than repeating the following
      mutFeatures[which(XVector::subseq(context, start=baseInd, end=baseInd)
                        == "C"), 1] <-
        mutFeatures[which(XVector::subseq(context, start=baseInd, end=baseInd)
                          == "C"), 1] + tempDigits * 1

A loop function is implemented to avoid repeating.

  • [ ] And again in vis_signature seems like there is a lot of very similar repeated code btween flanking bases, alternatives bases from C and alternative bases from T that could potentially be combined into a single helper function. I am afraid that I can't reduce the code by writing any helper function. These codes look similar but there are some differences here and there, which makes writing helper function very non-informative. This biggest two issues are to limit repeated code and have general code clean up to make it more manageable and maintainable and to better document the files and data you include in your package. As a general comment We would really recommend reaching out to pmsignature to submit their package to CRAN or Bioconductor. Enabling processing of a package that isn't readily downloadable or being actively maintained seems rather futile but will not prevent acceptance into Bioconductor.

We've reached to the author of pmsignature and he agreed with us using his scripts under the GPL-3 licence. Therefore, we have added a couple more functions from pmsignature stored in RcppExports.R, pm_getSignature.R and calss.R When you have addressed the above issues please do a version bump to trigger a new build and ensure all ERROR/Warnings/Notes are addressed. Please also comment back here with how the above has been addressed and to request a re-review.

I look forward to getting your package accepted into Bioconductor. Thank you so much for your careful review to improve our pacakge HiLDA. I have addressed or replied to all the comments. Looking forward to hearing from you about the next step.

lshep commented 5 years ago

I'm sorry for the delayed review. We had our Bioconductor conference and then I was away on Holiday. I will look over your package in the next few days for acceptance or any further comments.

lshep commented 5 years ago

The code is looking much better. A few more minor comments and possible clarification:

Build Report

Compiled code should not call entry points which might terminate R nor write to stdout/stderr instead of to the console, nor use Fortran I/O nor system RNGs. The detected symbols are linked into the code but might come from libraries and not actually be called.


**R code**

**hilda_diagnosis.R**

- [ ] Its better to not have a paste call in the `stop()` if possible
So

stop(paste("It got stuck in the model", as.numeric(names(freq)) + 1))

is better written

stop("It got stuck in the model ", as.numeric(names(freq)) + 1)


**hilda_plot.R**

- [ ] `tempSig[1, ] <- hildaResult$BUGSoutput$mean$pStates1[, , i]` can be moved
outside if/else statement since it is the same regardless. 

- [ ] Probably should say which function carries the `...` . Ie. `additional
  arguments passed on to visPMS`
- [ ] Please remove the unneeded paste calls in the `stop()` functions. It seems you aren't actually pasting anything together

stop(paste("There are more reference samples than the total samples")) stop(paste("The length of groups is greater than the sample size!"))

etc...


**hilda_test.R**

- [x] Much better condensed thank you! 
- [ ] What is the difference of running global vs local? 
- [ ] In the man page example should one of the values be changed from hildaGlobal if the variable localTest changes?  
- [ ] Please remove the unneeded paste calls in the `stop()` functions. It seems you aren't actually pasting anything together

**read_file.R**

- [ ] As previously with `stop`, `warning` would be better without pastes. apply
  throughout but one example

warning(paste("The central bases are inconsistent in", length(removeInd), "mutations. We have removed them."))

better

warning("The central bases are inconsistent in ", length(removeInd), " mutations. We have removed them.")


- [ ] why is only BSgenome.Hsapiens.UCSC.hg19 supported? This seems incorrect as
later you reference BSgenome.Mmusculus.UCSC.mm10?  As long as there is an
associated TxDb for the given genome will this not still work? 

**pm_plot.R**

- [ ] Probably should say which method `...` is being passed
- [ ] pmMultiBarplot, you say groupIndices defaults to NULL but there is no NULL argumnet in function parameters? 
- [ ] Again remove pastes from stop calls

Looking good. please make the above changes or comment on the above issues.  When ready please do a version bump for a new build report.  Thanks!
bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

3523ee4 version 0.99.9

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

1ae87ad remove git track

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

zhiiiyang commented 5 years ago

The code is looking much better. A few more minor comments and possible clarification:

Build Report

  • [x] Does the NOTE for C need to be addressed?
File 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1141/6203a5c/HiLDA.buildbin-libdir/HiLDA/libs/i386/HiLDA.dll':
  Found 'abort', possibly from 'abort' (C), 'runtime' (Fortran)
  Found 'exit', possibly from 'exit' (C), 'stop' (Fortran)
  Found 'printf', possibly from 'printf' (C)
File 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1141/6203a5c/HiLDA.buildbin-libdir/HiLDA/libs/x64/HiLDA.dll':
  Found 'abort', possibly from 'abort' (C), 'runtime' (Fortran)
  Found 'exit', possibly from 'exit' (C), 'stop' (Fortran)
  Found 'printf', possibly from 'printf' (C)

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs. The detected symbols are linked into the code but
might come from libraries and not actually be called.

R code

hilda_diagnosis.R

  • [X] Its better to not have a paste call in the stop() if possible So
stop(paste("It got stuck in the model", as.numeric(names(freq)) + 1))

# is better written

stop("It got stuck in the model ", as.numeric(names(freq)) + 1)

hilda_plot.R

  • [x] tempSig[1, ] <- hildaResult$BUGSoutput$mean$pStates1[, , i] can be moved outside if/else statement since it is the same regardless.
  • [x] Probably should say which function carries the ... . Ie. additional arguments passed on to visPMS
  • [x] Please remove the unneeded paste calls in the stop() functions. It seems you aren't actually pasting anything together
stop(paste("There are more reference samples than the total samples"))
stop(paste("The length of groups is greater than the sample size!"))

etc...

hilda_test.R

  • [x] Much better condensed thank you!
  • [x] What is the difference of running global vs local? The global test returns a Bayes Factor while the local test does not, which results in different JAGS script used in the MCMC process.
  • [x] In the man page example should one of the values be changed from hildaGlobal if the variable localTest changes?
  • [x] Please remove the unneeded paste calls in the stop() functions. It seems you aren't actually pasting anything together

read_file.R

  • [x] As previously with stop, warning would be better without pastes. apply throughout but one example
warning(paste("The central bases are inconsistent in", length(removeInd),
                    "mutations. We have removed them."))

## better

warning("The central bases are inconsistent in ", length(removeInd), " mutations. We have removed them.")
  • [x] why is only BSgenome.Hsapiens.UCSC.hg19 supported? This seems incorrect as later you reference BSgenome.Mmusculus.UCSC.mm10? As long as there is an associated TxDb for the given genome will this not still work? That comment has been removed. It supports genomes and transcripts other than the hg19 human reference genome. pm_plot.R

  • [x] Probably should say which method ... is being passed

  • [x] pmMultiBarplot, you say groupIndices defaults to NULL but there is no NULL argumnet in function parameters?

  • [x] Again remove pastes from stop calls

Looking good. please make the above changes or comment on the above issues. When ready please do a version bump for a new build report. Thanks!

lshep commented 5 years ago

I would put the explanation of running local vs global in hilda_test in the argument of the man page but that will not hinder proceeding at this stage.
I think the package looks much better. Thank you for making the improvements.

bioc-issue-bot commented 5 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

mtmorgan commented 5 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/zhiiiyang.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("HiLDA"). The package 'landing page' will be created at

https://bioconductor.org/packages/HiLDA

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.