Closed PoisonAlien closed 4 years ago
Hi @PoisonAlien
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: methrix
Title: Fast and efficient summarization of generic bedGraph files from Bisufite sequencing
Version: 0.99.0
Authors@R: c(
person("Anand","Mayakonda", role = c("aut", "cre"), email = "anand_mt@hotmail.com"),
person("Reka","Toth", role = "aut", email = "r.toth@dkfz.de"),
person("Maximilian","Schönung", role = "ctb"),
person("Pavlo","Lutsik", role = "ctb"),
person("Joschka","Hey", role = "ctb"))
Description: Bedgraph files generated by Bisulfite pipelines often come in various flavors. Critical downstream step requires summarization of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends: R (>= 3.6), data.table (>= 1.12.4), SummarizedExperiment
Imports: DelayedArray, HDF5Array, BSgenome, rjson, DelayedMatrixStats, parallel, methods, ggplot2, matrixStats
RoxygenNote: 6.1.1
Suggests:
knitr,
rmarkdown,
DSS,
bsseq,
plotly,
BSgenome.Mmusculus.UCSC.mm9,
MafDb.1Kgenomes.phase3.GRCh38,
MafDb.1Kgenomes.phase3.hs37d5,
GenomicScores,
Biostrings,
RColorBrewer
VignetteBuilder: knitr
biocViews: DNAMethylation, Sequencing, Coverage
Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear @lshep, It seems that the error we are getting is caused by our package being dependent on the data.table version 1.12.4, which was released only on 3. October and is not yet in BioC's build systems. Can we do anything to address the error, or should we just wait?
I believe it should have 1.12.4 now installed. I'm kicking off a manual build and hopefully it should pick it up.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
fa67c8f fixed issues for BioC check
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
f490fbe version change
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
bfc926b decrease size 310b1ec Version bump 9c4a615 Merge branch 'master' of https://github.com/CompEp...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear @lshep,
The recent build report only shows an error here: ** running examples for arch 'i386' ... ERROR I am not sure why I am getting this error, because this part ran without a problem before and I didn't change anything here. Could you please take a look? Thank you!
Received a valid push; starting a build. Commits are:
c75b682 version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
b0ee870 version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear @lshep, Could you please take a look? It seems to me that the error is not related to our package:
Should I do a version bump to start a new build? Thank you!
Best, Reka
This is an ERROR on our end. No need to kick off a new build. I will take a look at the package and post a review shortly. In the meantime please try to clean up all the NOTES in the build report - esp the global variable/functions and the coding practice NOTES.
Thanks! I cleaned the notes as much as possible. I will wait for your review and commit the changes together.
Please see the following review:
build report
eval=FALSE
I
would prefer to see the vignette eval=TRUE
and have the vignette run in its
entireity - then leaving these to dontrun are okay because of time constraint. README
LICENSE/LICENSE.md
DESCRIPTION
LazyData: true
as it can
slow installation of the package. inst/
inst/script
diectory that describes where or how the
data in inst/extdata
was curated/generated. This should include basic source
information. summarize_metrix.Rmd
should either be moved to another
vignette or included as inst/doc
vignette
eval=FALSE
this is not recommended. Run the vignette
interactively. All functions should be run once somewhere unless a very good
argument otherwise. I recommend running these beginning sections with
eval=TRUE
BiocManager::install
rather than
install.packages
esp since these are Bioconductor packages the repos might not
be set correctly relying on CRAN.
> meth_snps_filtered = methrix::remove_snps(m = meth)
Used SNP database: MafDb.1Kgenomes.phase3.hs37d5.
Error in .scores_snrs(x, ranges, pop, summaryFun, quantized, scores.only, :
Sequence names M in 'ranges' not present in reference genome hs37d5.
- [ ] Is this filtering snps an optional step?
- [ ] I suggeset reformting this section slightly. Show that you can install the
needed packages through Bioconductor with the code to install in `eval=FALSE`
but then have the code section that actually runs the function call with the
assumption that the packages are available
library(GenomicScores) library(MafDb.1Kgenomes.phase3.hs37d5) meth_snps_filtered = methrix::remove_snps(m = meth)
- [ ] Remove the manually added 'output'
- [ ] in section remove uncovered loci. why is this in an `eval=FALSE`? Run
interactively and remove the manually added 'output'
- [ ] `meth_mat_with_loci = methrix::get_matrix(m = methrix_data, type = "M",
add_loci = TRUE)` The results of this really are a GRanges object. Do you
have a wrapper or show the code to conver into a standard GRanges object.
- [ ] Subset by genomic regions. Show subsetting by GRanges object. hint:
`GRanges("chr21:27867971-27868103")`
- [ ] in section 'converting methrix to BSeq' . Again we prefer to see
`BiocManager::install` rather than `install.package` instructions. This code
should be in an `eval=FALSE` and then show the code to run
library(bsseq) bs_seq = methrix::methrix2bsseq(m = methrix_data)
**R / man**
**General and should be applied to all files**
- [ ] Remove commentted out code chunks. True comments in code are fine but
remove legacy code
- [ ] Use `<-` rather than `=` except for argument assignment.
- [ ] use `message` instead of `cat` so they can be suppressed if the user wants
_accessory\_funcs.R_
- [ ] has parameter checking been done previous to this point in the code?
- [ ] Remove older implementation (the beauty of git tracking is you can always
find and revert the commit removing this code)
_check\_bins.R_
- [ ] These functions do the same thing. combined into one function that checks
bedtools/bedGraphToBigWig/bedClip based on a second argument.
something along the lines:
check_bins <- function(path=NULL, value=c("bedtools", "bedGraphToBigWig", "bedClip")){
names = match.arg(value) if(is.null(path)){ check = as.character(Sys.which(names = names))[1] }else{ check = paste0(path, "/", names") if(!file.exists(check)){ stop("Could not locate ", names, " at " , check ,"\nDownload: http://hgdownload.cse.ucsc.edu/admin/exe/") } }
if(check != ""){ return(check) }else{ stop("Could not locate ", names, " at " , check ,"\nDownload: http://hgdownload.cse.ucsc.edu/admin/exe/") } }
_extract\_CpGs.R_
- [ ] Remove unneeded paste0 in message call.
- [ ] `if(nrow(gnoms_installed[pkgname %in% ref_genome]) == 0){` seems like it
won't work as pkgname is not defined anywhere?
- [ ] While I appreciate the humor, I would not recommend the `bored = TRUE` to
tell a joke from the Chuck Norris Database. For a number of reasons, starting
with it adds a layer of complexity and an additional packages dependency that
would otherwise be unneeded. Being a publicallly open space, the fact they can
be explicit is also discouraged.
- [ ] cpgs in the list should remain a GRanges object not flattened to a
table. Any place this is used should utilize GRanges.
- [ ] message instead of cat
_matrix-plot.R_
- [ ] Have more argument checking. For example - There is no check on the ranges
if it is not null if it is a GRanges object or if it has the required
columns. Unless this is handled by subset\_matrix? But this sort of thinking
that the data provided is the right class/structure to all (or many of the
important arguments) should be checked.
- [ ] Duplicate code should be pulled into its own helper function. Lines 16-59
in the violin plot are the same as 93-132 of the density plot
- [ ] FWIW - it seems like this file should be methrix\_plot to be consistent
with all the other files instead of a dash.
_methrix\_operations.R_
- [ ] check if argument/parameter checking is necessary for any user input.
_mm9\_bsmap.R_
- [ ] It appears this was removed so please also remove the documentation file.
_read\_bedgraphs.R_
- [ ] Remove TODO. Make sure this works or fix it!
- [ ] so it is okay to leave chr_idx, start_idx, etc... NULL as there is no
default set
Thank you for the submission. Please work on the above changes. When ready please version bump to kick off a new build and comment back here that you are ready for a re-review and a summary of the changes.
Received a valid push; starting a build. Commits are:
901e757 version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Dear @lshep, Thank you for the review and your detailed explanations. I modified our package according to your recommendations. I hope we can still make it to the autumn release. See the detailed answer below:
Please see the following review:
build report
[x] Please fix global variable/function
[x] Rd line lengths
[x] Coding practices – Significantly reduced the too long lines and wrong intends. The remaining long rows doesn’t “look good”, when split.
[ ] We really recommend having some basic unit tests. .- In progress, I will commit them ASAP
[x] What is the reasoning behind not having runnable examples in exported functions? – As you say later, basic time constraint. We use a small dataset as an example, but since some of the functions (read_bedgraph) works with the whole genome, irrespective of the example size, it takes relatively long. Therefore, according to your suggestion, I let them run in the vignette, where I set all of them to eval=TRUE.
[x] Are the dontruns run some other place (vignette or test?) all function unless strong reasoning should be run at least once. If they are run elsewhere than this is okay (ish). Looking at the vignette these are all
eval=FALSE
I would prefer to see the vignetteeval=TRUE
and have the vignette run in its entireity - then leaving these to dontrun are okay because of time constraint. > See the previous answer. README[x] Because I noticed your badges. We recommend packages only get submitted to CRAN or Bioconductor and not both but you have the badges for both. What is your intention here? – I deleted the CRAN badge. We have talked with CRAN about this and our reasoning is often maintainers than will update one but not the other; causing multiple 'current' versions to be out to the universe.
[x] You list functions multiple times in Methrix operations
LICENSE/LICENSE.md
- [x] Remove LICENSE.md file. This is already included with the base R distribution. Keep LICENSE file.
DESCRIPTION
- [x] In our experience we don't recommend including
LazyData: true
as it can slow installation of the package.inst/
- [x] Please include an
inst/script
diectory that describes where or how the data ininst/extdata
was curated/generated. This should include basic source information. I added thedata.generation.R
to the script folder. It seems that the underlying data changed since the first generation, therefore themethrix_data
changed slightly.- [x] It seems like
summarize_metrix.Rmd
should either be moved to another vignette or included asinst/doc
– Thesummarize_methrix.Rmd
function is needed to generate a report. I moved it toinst/report
.vignette
- [x] I like the overview graph. Thank you for including it!
- [x] A LOT of
eval=FALSE
this is not recommended. Run the vignette interactively. All functions should be run once somewhere unless a very good argument otherwise. I recommend running these beginning sections witheval=TRUE
See above.- [x] Remove SNPs. We would recommend using
BiocManager::install
rather thaninstall.packages
esp since these are Bioconductor packages the repos might not be set correctly relying on CRAN.- [x] When I try to run the remove snps code I get an ERROR
> meth_snps_filtered = methrix::remove_snps(m = meth) Used SNP database: MafDb.1Kgenomes.phase3.hs37d5. Error in .scores_snrs(x, ranges, pop, summaryFun, quantized, scores.only, : Sequence names M in 'ranges' not present in reference genome hs37d5.
– fixed, the functions were in the wrong order.
- [x] Is this filtering snps an optional step? – Yes, it is. It depends on the genome that is used, it is usually done only on human data.
- [x] I suggeset reformting this section slightly. Show that you can install the needed packages through Bioconductor with the code to install in
eval=FALSE
but then have the code section that actually runs the function call with the assumption that the packages are availablelibrary(GenomicScores) library(MafDb.1Kgenomes.phase3.hs37d5) meth_snps_filtered = methrix::remove_snps(m = meth)
- [x] Remove the manually added 'output'
- [x] in section remove uncovered loci. why is this in an
eval=FALSE
? Run interactively and remove the manually added 'output'- [x]
meth_mat_with_loci = methrix::get_matrix(m = methrix_data, type = "M", add_loci = TRUE)
The results of this really are a GRanges object. Do you have a wrapper or show the code to conver into a standard GRanges object. – I added the option in_granges to theget_matrix
function to get a Granges output.- [x] Subset by genomic regions. Show subsetting by GRanges object. hint:
GRanges("chr21:27867971-27868103")
- [x] in section 'converting methrix to BSeq' . Again we prefer to see
BiocManager::install
rather thaninstall.package
instructions. This code should be in aneval=FALSE
and then show the code to runlibrary(bsseq) bs_seq = methrix::methrix2bsseq(m = methrix_data)
R / man
General and should be applied to all files
- [x] Remove commentted out code chunks. True comments in code are fine but remove legacy code
- [x] Use
<-
rather than=
except for argument assignment.- [x] use
message
instead ofcat
so they can be suppressed if the user wants_accessoryfuncs.R
- [x] has parameter checking been done previous to this point in the code? - Yes, these are only internal functions, the parameter checking is in other functions.
- [x] Remove older implementation (the beauty of git tracking is you can always find and revert the commit removing this code)
_checkbins.R
- [x] These functions do the same thing. combined into one function that checks bedtools/bedGraphToBigWig/bedClip based on a second argument. something along the lines:
check_bins <- function(path=NULL, value=c("bedtools", "bedGraphToBigWig", "bedClip")){ names = match.arg(value) if(is.null(path)){ check = as.character(Sys.which(names = names))[1] }else{ check = paste0(path, "/", names") if(!file.exists(check)){ stop("Could not locate ", names, " at " , check ,"\nDownload: <http://hgdownload.cse.ucsc.edu/admin/exe/>") } } if(check != ""){ return(check) }else{ stop("Could not locate ", names, " at " , check ,"\nDownload: <http://hgdownload.cse.ucsc.edu/admin/exe/>") } }
_extractCpGs.R
- [x] Remove unneeded paste0 in message call.
- [x]
if(nrow(gnoms_installed[pkgname %in% ref_genome]) == 0){
seems like it won't work as pkgname is not defined anywhere?- [x] While I appreciate the humor, I would not recommend the
bored = TRUE
to tell a joke from the Chuck Norris Database. For a number of reasons, starting with it adds a layer of complexity and an additional packages dependency that would otherwise be unneeded. Being a publicallly open space, the fact they can be explicit is also discouraged.- [ ] cpgs in the list should remain a GRanges object not flattened to a table. Any place this is used should utilize GRanges. – Here extracting cpgs is just a step in reading in the bedgraph files. It the read_bedgraph function we use the cpgs as a data.table, therefore it seems easier, if the outcome of this is a data.table. I would rather not change it, as long as it is absolutely necessary.
- [x] message instead of cat
matrix-plot.R
- [x] Have more argument checking. For example - There is no check on the ranges if it is not null if it is a GRanges object or if it has the required columns. Unless this is handled by subset_matrix? But this sort of thinking that the data provided is the right class/structure to all (or many of the important arguments) should be checked. – Yes, some of it is taken care by the
subset_matrix
function, but I also added additional argument checking.- [x] Duplicate code should be pulled into its own helper function. Lines 16-59 in the violin plot are the same as 93-132 of the density plot – extracted into a separate function
- [x] FWIW - it seems like this file should be methrix_plot to be consistent with all the other files instead of a dash.
_methrixoperations.R
- [x] check if argument/parameter checking is necessary for any user input. – Added additional argument checking
_mm9bsmap.R
- [x] It appears this was removed so please also remove the documentation file.
_readbedgraphs.R
- [x] Remove TODO. Make sure this works or fix it!
- [x] so it is okay to leave chr_idx, start_idx, etc... NULL as there is no default set - yes, it is ok.
I will commit unit tests very soon. Please let me know if you have any more concerns. Thank you for your help!
Thank you. Last few clean up items before acceptance:
vignette
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if(!requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) {
BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
}
R / man
check_bins.R
Please make these last to changes and do a version bump to the DESCRIPTION and I see no other reason not to accept the package. Thank you for making the previous set of changes and improvements.
Cheers,
Received a valid push; starting a build. Commits are:
c8f26c3 Version bump
Thank you! [x] I removed the ´check_bins.R´ file, because I realized we used it in a previous version of ´write_bedgraph()´ [x] I modified the vignette as well. [x] added some unit tests and some minor bug fixes. [x] cleaned up the old version of example bedgraphs.
I hope it is ready for acceptance :) Cheers,
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear @lshep,
I think the build had errors that might come from your end. Apparently some packages are missing. Could you take a look? Thanks!
yes im looking into it
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/PoisonAlien.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("methrix")
. The package 'landing page' will be created at
https://bioconductor.org/packages/methrix
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.