Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

methrix #1276

Closed PoisonAlien closed 4 years ago

PoisonAlien commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @PoisonAlien

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: methrix
Title: Fast and efficient summarization of generic bedGraph files from Bisufite sequencing
Version: 0.99.0
Authors@R: c(
  person("Anand","Mayakonda", role = c("aut", "cre"), email = "anand_mt@hotmail.com"),
  person("Reka","Toth", role = "aut", email = "r.toth@dkfz.de"),
  person("Maximilian","Schönung", role = "ctb"),
  person("Pavlo","Lutsik", role = "ctb"),
  person("Joschka","Hey", role = "ctb"))
Description: Bedgraph files generated by Bisulfite pipelines often come in various flavors. Critical downstream step requires summarization of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends: R (>= 3.6), data.table (>= 1.12.4), SummarizedExperiment
Imports: DelayedArray, HDF5Array, BSgenome, rjson, DelayedMatrixStats, parallel, methods, ggplot2, matrixStats
RoxygenNote: 6.1.1
Suggests: 
    knitr,
    rmarkdown,
    DSS,
    bsseq,
    plotly,
    BSgenome.Mmusculus.UCSC.mm9,
    MafDb.1Kgenomes.phase3.GRCh38,
    MafDb.1Kgenomes.phase3.hs37d5,
    GenomicScores,
    Biostrings,
    RColorBrewer
VignetteBuilder: knitr
biocViews: DNAMethylation, Sequencing, Coverage

Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.

bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

tkik commented 5 years ago

Dear @lshep, It seems that the error we are getting is caused by our package being dependent on the data.table version 1.12.4, which was released only on 3. October and is not yet in BioC's build systems. Can we do anything to address the error, or should we just wait?

lshep commented 5 years ago

I believe it should have 1.12.4 now installed. I'm kicking off a manual build and hopefully it should pick it up.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

fa67c8f fixed issues for BioC check

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

f490fbe version change

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

bfc926b decrease size 310b1ec Version bump 9c4a615 Merge branch 'master' of https://github.com/CompEp...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

tkik commented 5 years ago

Dear @lshep,

The recent build report only shows an error here: ** running examples for arch 'i386' ... ERROR I am not sure why I am getting this error, because this part ran without a problem before and I didn't change anything here. Could you please take a look? Thank you!

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

c75b682 version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

b0ee870 version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

tkik commented 5 years ago

Dear @lshep, Could you please take a look? It seems to me that the error is not related to our package:

Should I do a version bump to start a new build? Thank you!

Best, Reka

lshep commented 5 years ago

This is an ERROR on our end. No need to kick off a new build. I will take a look at the package and post a review shortly. In the meantime please try to clean up all the NOTES in the build report - esp the global variable/functions and the coding practice NOTES.

tkik commented 5 years ago

Thanks! I cleaned the notes as much as possible. I will wait for your review and commit the changes together.

lshep commented 5 years ago

Please see the following review:

build report

README

LICENSE/LICENSE.md

DESCRIPTION

inst/

vignette

- [ ] Is this filtering snps an optional step?  
- [ ] I suggeset reformting this section slightly. Show that you can install the
needed packages through Bioconductor with the code to install in `eval=FALSE`
but then have the code section that actually runs the function call with the
assumption that the packages are available 

library(GenomicScores) library(MafDb.1Kgenomes.phase3.hs37d5) meth_snps_filtered = methrix::remove_snps(m = meth)

- [ ] Remove the manually added 'output'
- [ ] in section remove uncovered loci.  why is this in an `eval=FALSE`? Run
interactively and remove the manually added 'output'

- [ ] `meth_mat_with_loci = methrix::get_matrix(m = methrix_data, type = "M",
  add_loci = TRUE)` The results of this really are a GRanges object.  Do you
  have a wrapper or show the code to conver into a standard GRanges object. 
- [ ] Subset by genomic regions. Show subsetting by GRanges object. hint:
  `GRanges("chr21:27867971-27868103")`

- [ ] in section 'converting methrix to BSeq' . Again we prefer to see
  `BiocManager::install` rather than `install.package` instructions. This code
  should be in an `eval=FALSE` and then show the code to run

library(bsseq) bs_seq = methrix::methrix2bsseq(m = methrix_data)


**R / man**

**General and should be applied to all files**
- [ ] Remove commentted out code chunks. True comments in code are fine but
remove legacy code
- [ ] Use `<-` rather than `=` except for argument assignment. 
- [ ] use `message` instead of `cat` so they can be suppressed if the user wants

_accessory\_funcs.R_
- [ ] has parameter checking been done previous to this point in the code?
- [ ] Remove older implementation (the beauty of git tracking is you can always
find and revert the commit removing this code)

_check\_bins.R_
- [ ] These functions do the same thing. combined into one function that checks
bedtools/bedGraphToBigWig/bedClip based on a second argument. 
something along the lines:

check_bins <- function(path=NULL, value=c("bedtools", "bedGraphToBigWig", "bedClip")){

names = match.arg(value) if(is.null(path)){ check = as.character(Sys.which(names = names))[1] }else{ check = paste0(path, "/", names") if(!file.exists(check)){ stop("Could not locate ", names, " at " , check ,"\nDownload: http://hgdownload.cse.ucsc.edu/admin/exe/") } }

if(check != ""){ return(check) }else{ stop("Could not locate ", names, " at " , check ,"\nDownload: http://hgdownload.cse.ucsc.edu/admin/exe/") } }



_extract\_CpGs.R_
- [ ] Remove unneeded paste0 in message call. 
- [ ] `if(nrow(gnoms_installed[pkgname %in% ref_genome]) == 0){`  seems like it
won't work as pkgname is not defined anywhere? 
- [ ] While I appreciate the humor, I would not recommend the `bored = TRUE` to
tell a joke from the Chuck Norris Database.  For a number of reasons, starting
with it adds a layer of complexity and an additional packages dependency that
would otherwise be unneeded.  Being a publicallly open space, the fact they can
be explicit is also discouraged.  
- [ ] cpgs in the list should remain a GRanges object not flattened to a
table. Any place this is used should utilize GRanges.  
- [ ] message instead of cat

_matrix-plot.R_
- [ ] Have more argument checking. For example - There is no check on the ranges
if it is not null if it is a GRanges object or if it has the required
columns. Unless this is handled by subset\_matrix?  But this sort of thinking
that the data provided is the right class/structure to all (or many of the
important arguments) should be checked. 
- [ ] Duplicate code should be pulled into its own helper function. Lines 16-59
in the violin plot are the same as 93-132 of the density plot 
- [ ] FWIW - it seems like this file should be methrix\_plot to be consistent
with all the other files instead of a dash. 

_methrix\_operations.R_

- [ ] check if argument/parameter checking is necessary for any user input. 

_mm9\_bsmap.R_
- [ ] It appears this was removed so please also remove the documentation file.

_read\_bedgraphs.R_
- [ ] Remove TODO.  Make sure this works or fix it! 
- [ ] so it is okay to leave chr_idx, start_idx, etc... NULL as there is no
default set

Thank you for the submission.  Please work on the above changes.  When ready please version bump to kick off a new build and comment back here that you are ready for a re-review and a summary of the changes.  
bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

901e757 version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

tkik commented 5 years ago

Dear @lshep, Thank you for the review and your detailed explanations. I modified our package according to your recommendations. I hope we can still make it to the autumn release. See the detailed answer below:

Please see the following review:

build report

  • [x] Please fix global variable/function

  • [x] Rd line lengths

  • [x] Coding practices – Significantly reduced the too long lines and wrong intends. The remaining long rows doesn’t “look good”, when split.

  • [ ] We really recommend having some basic unit tests. .- In progress, I will commit them ASAP

  • [x] What is the reasoning behind not having runnable examples in exported functions? – As you say later, basic time constraint. We use a small dataset as an example, but since some of the functions (read_bedgraph) works with the whole genome, irrespective of the example size, it takes relatively long. Therefore, according to your suggestion, I let them run in the vignette, where I set all of them to eval=TRUE.

  • [x] Are the dontruns run some other place (vignette or test?) all function unless strong reasoning should be run at least once. If they are run elsewhere than this is okay (ish). Looking at the vignette these are all eval=FALSE I would prefer to see the vignette eval=TRUE and have the vignette run in its entireity - then leaving these to dontrun are okay because of time constraint. > See the previous answer. README

  • [x] Because I noticed your badges. We recommend packages only get submitted to CRAN or Bioconductor and not both but you have the badges for both. What is your intention here? – I deleted the CRAN badge. We have talked with CRAN about this and our reasoning is often maintainers than will update one but not the other; causing multiple 'current' versions to be out to the universe.

  • [x] You list functions multiple times in Methrix operations

LICENSE/LICENSE.md

  • [x] Remove LICENSE.md file. This is already included with the base R distribution. Keep LICENSE file.

DESCRIPTION

  • [x] In our experience we don't recommend including LazyData: true as it can slow installation of the package.

inst/

  • [x] Please include an inst/script diectory that describes where or how the data in inst/extdata was curated/generated. This should include basic source information. I added the data.generation.R to the script folder. It seems that the underlying data changed since the first generation, therefore the methrix_data changed slightly.
  • [x] It seems like summarize_metrix.Rmd should either be moved to another vignette or included as inst/doc – The summarize_methrix.Rmd function is needed to generate a report. I moved it to inst/report.

vignette

  • [x] I like the overview graph. Thank you for including it!
  • [x] A LOT of eval=FALSE this is not recommended. Run the vignette interactively. All functions should be run once somewhere unless a very good argument otherwise. I recommend running these beginning sections with eval=TRUE See above.
  • [x] Remove SNPs. We would recommend using BiocManager::install rather than install.packages esp since these are Bioconductor packages the repos might not be set correctly relying on CRAN.
  • [x] When I try to run the remove snps code I get an ERROR
> meth_snps_filtered = methrix::remove_snps(m = meth)
Used SNP database: MafDb.1Kgenomes.phase3.hs37d5. 
Error in .scores_snrs(x, ranges, pop, summaryFun, quantized, scores.only,  : 
  Sequence names M in 'ranges' not present in reference genome hs37d5.

– fixed, the functions were in the wrong order.

  • [x] Is this filtering snps an optional step? – Yes, it is. It depends on the genome that is used, it is usually done only on human data.
  • [x] I suggeset reformting this section slightly. Show that you can install the needed packages through Bioconductor with the code to install in eval=FALSE but then have the code section that actually runs the function call with the assumption that the packages are available
library(GenomicScores)
library(MafDb.1Kgenomes.phase3.hs37d5)
meth_snps_filtered = methrix::remove_snps(m = meth)
  • [x] Remove the manually added 'output'
  • [x] in section remove uncovered loci. why is this in an eval=FALSE? Run interactively and remove the manually added 'output'
  • [x] meth_mat_with_loci = methrix::get_matrix(m = methrix_data, type = "M", add_loci = TRUE) The results of this really are a GRanges object. Do you have a wrapper or show the code to conver into a standard GRanges object. – I added the option in_granges to the get_matrix function to get a Granges output.
  • [x] Subset by genomic regions. Show subsetting by GRanges object. hint: GRanges("chr21:27867971-27868103")
  • [x] in section 'converting methrix to BSeq' . Again we prefer to see BiocManager::install rather than install.package instructions. This code should be in an eval=FALSE and then show the code to run
library(bsseq)
bs_seq = methrix::methrix2bsseq(m = methrix_data)

R / man

General and should be applied to all files

  • [x] Remove commentted out code chunks. True comments in code are fine but remove legacy code
  • [x] Use <- rather than = except for argument assignment.
  • [x] use message instead of cat so they can be suppressed if the user wants

_accessoryfuncs.R

  • [x] has parameter checking been done previous to this point in the code? - Yes, these are only internal functions, the parameter checking is in other functions.
  • [x] Remove older implementation (the beauty of git tracking is you can always find and revert the commit removing this code)

_checkbins.R

  • [x] These functions do the same thing. combined into one function that checks bedtools/bedGraphToBigWig/bedClip based on a second argument. something along the lines:
check_bins <- function(path=NULL, value=c("bedtools", "bedGraphToBigWig",
"bedClip")){

names = match.arg(value)
if(is.null(path)){
    check = as.character(Sys.which(names = names))[1]
  }else{
    check = paste0(path, "/", names")
    if(!file.exists(check)){
      stop("Could not locate ", names, " at " , check ,"\nDownload: <http://hgdownload.cse.ucsc.edu/admin/exe/>")
    }
  }

  if(check != ""){
    return(check)
  }else{
    stop("Could not locate ", names, " at " , check ,"\nDownload: <http://hgdownload.cse.ucsc.edu/admin/exe/>")
  }
}

_extractCpGs.R

  • [x] Remove unneeded paste0 in message call.
  • [x] if(nrow(gnoms_installed[pkgname %in% ref_genome]) == 0){ seems like it won't work as pkgname is not defined anywhere?
  • [x] While I appreciate the humor, I would not recommend the bored = TRUE to tell a joke from the Chuck Norris Database. For a number of reasons, starting with it adds a layer of complexity and an additional packages dependency that would otherwise be unneeded. Being a publicallly open space, the fact they can be explicit is also discouraged.
  • [ ] cpgs in the list should remain a GRanges object not flattened to a table. Any place this is used should utilize GRanges. – Here extracting cpgs is just a step in reading in the bedgraph files. It the read_bedgraph function we use the cpgs as a data.table, therefore it seems easier, if the outcome of this is a data.table. I would rather not change it, as long as it is absolutely necessary.
  • [x] message instead of cat

matrix-plot.R

  • [x] Have more argument checking. For example - There is no check on the ranges if it is not null if it is a GRanges object or if it has the required columns. Unless this is handled by subset_matrix? But this sort of thinking that the data provided is the right class/structure to all (or many of the important arguments) should be checked. – Yes, some of it is taken care by the subset_matrix function, but I also added additional argument checking.
  • [x] Duplicate code should be pulled into its own helper function. Lines 16-59 in the violin plot are the same as 93-132 of the density plot – extracted into a separate function
  • [x] FWIW - it seems like this file should be methrix_plot to be consistent with all the other files instead of a dash.

_methrixoperations.R

  • [x] check if argument/parameter checking is necessary for any user input. – Added additional argument checking

_mm9bsmap.R

  • [x] It appears this was removed so please also remove the documentation file.

_readbedgraphs.R

  • [x] Remove TODO. Make sure this works or fix it!
  • [x] so it is okay to leave chr_idx, start_idx, etc... NULL as there is no default set - yes, it is ok.

I will commit unit tests very soon. Please let me know if you have any more concerns. Thank you for your help!

lshep commented 5 years ago

Thank you. Last few clean up items before acceptance:

vignette

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
if(!requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) {
BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
}

R / man

check_bins.R

Please make these last to changes and do a version bump to the DESCRIPTION and I see no other reason not to accept the package. Thank you for making the previous set of changes and improvements.
Cheers,

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

c8f26c3 Version bump

tkik commented 5 years ago

Thank you! [x] I removed the ´check_bins.R´ file, because I realized we used it in a previous version of ´write_bedgraph()´ [x] I modified the vignette as well. [x] added some unit tests and some minor bug fixes. [x] cleaned up the old version of example bedgraphs.

I hope it is ready for acceptance :) Cheers,

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

tkik commented 5 years ago

Dear @lshep,

I think the build had errors that might come from your end. Apparently some packages are missing. Could you take a look? Thanks!

lshep commented 5 years ago

yes im looking into it

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

mtmorgan commented 4 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/PoisonAlien.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("methrix"). The package 'landing page' will be created at

https://bioconductor.org/packages/methrix

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.