Closed GarrettJenkinson closed 2 years ago
Hi @GarrettJenkinson
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: borealis
Type: Package
Title: Bisulfite-seq OutlieR mEthylation At singLe-sIte reSolution
Version: 0.99.0
Authors@R: person(given = "Garrett",
family = "Jenkinson",
role = c("aut", "cre"),
email = "gargar934@gmail.com",
comment = c(ORCID = "0000-0003-2548-098X"))
Depends: R (>= 4.1.0), Biobase
Imports: doMC, purrr, plyr, foreach, gamlss, gamlss.dist, bsseq,
methods, DSS, R.utils, utils, stats, ggplot2, cowplot, dplyr, rlang
Description: Borealis is an R library performing outlier analysis for
count-based bisulfite sequencing data. It detectes outlier
methylated CpG sites from bisulfite sequencing
(BS-seq). The core of Borealis is modeling Beta-Binomial distributions. This
can be useful for rare disease diagnoses.
License: GPL-3
Encoding: UTF-8
Suggests: BiocStyle, knitr, rmarkdown, RUnit, BiocGenerics, annotatr, tidyr,
TxDb.Hsapiens.UCSC.hg19.knownGene, org.Hs.eg.db
VignetteBuilder: knitr
biocViews: Sequencing, Coverage, DNAMethylation, DifferentialMethylation
Vignette needs work. You have too many raw dumps like
## dmrId seqnames start end width strand x n mu theta
## 1 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 2 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 3 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 4 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 5 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 6 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 7 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 8 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 9 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 10 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 11 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 12 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 13 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 14 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 15 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 16 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 17 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 18 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 19 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 20 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
## 21 1 chr14 24780570 24780569 0 * 17 28 0.1170744 0.05674164
It should probably be a GRanges. Control the digits dumped too.
A vignette should not have
perFeatFunc <- function(annot_id,pvals,padjThresh=0.05){
ss <- subset(pvals, annot.id == annot_id)
sshypo <- subset(ss, isHypo == "TRUE")
ssnohypo <- subset(ss, isHypo == "FALSE")
ssna <- subset(ss, is.na(isHypo))
df <- length(ss$pAdj)*2
dfhypo <- length(sshypo$pAdj)*2
dfnohypo <- length(ssnohypo$pAdj)*2
pout <- pchisq(-2*sum(log(c(ss$pAdj))),df,lower.tail=FALSE)
pouthypo <- pchisq(-2*sum(log(c(sshypo$pAdj))),dfhypo,lower.tail=FALSE)
poutnohypo <- pchisq(-2*sum(log(c(ssnohypo$pAdj))),
dfnohypo,lower.tail=FALSE)
medeshypo <- median(c(sshypo$effSize))
medesnohypo <- median(c(ssnohypo$effSize))
ishypo <- length(ss$isHypo[which(ss$isHypo == "TRUE" &
ss$pAdj <= padjThresh)])
ishypoAll <- length(ss$isHypo[which(ss$isHypo == "TRUE")])
ishyper <- length(ss$isHypo[which(ss$isHypo == "FALSE" &
ss$pAdj <= padjThresh)])
ishyperAll <- length(ss$isHypo[which(ss$isHypo == "FALSE")])
isNa <- length(ss$isHypo[which(is.na(ss$isHypo))])
bestPval <- min(ss$pAdj)
minEs <- min(ss$effSize)
maxEs <- max(ss$effSize)
bestEs <- ifelse(abs(maxEs) > abs(minEs), maxEs, minEs)
combout <- paste(ishypo, ishypoAll, ishyper, ishyperAll, isNa,
pout, pouthypo, medeshypo, poutnohypo, medesnohypo, bestPval,
bestEs, sep=",")
return(combout)
}
A vignette should help a prospective user understand the science and the basic operations of the package, using live code.
This dump
## [1] "vignette_borealis_patient_70_chr14_DMLs.tsv"
## [2] "vignette_borealis_patient_71_chr14_DMLs.tsv"
## [3] "vignette_borealis_patient_72_chr14_DMLs.tsv"
## [4] "vignette_borealis_patient_73_chr14_DMLs.tsv"
## [5] "vignette_borealis_patient_74_chr14_DMLs.tsv"
## [6] "vignette_borealis_patient_75_chr14_DMLs.tsv"
## [7] "vignette_borealis_patient_76_chr14_DMLs.tsv"
## [8] "vignette_borealis_patient_77_chr14_DMLs.tsv"
## [9] "vignette_borealis_patient_78_chr14_DMLs.tsv"
## [10] "vignette_borealis_patient_79_chr14_DMLs.tsv"
## [11] "vignette_borealis_patient_7_chr14_DMLs.tsv"
## [12] "vignette_borealis_patient_80_chr14_DMLs.tsv"
## [13] "vignette_borealis_patient_81_chr14_DMLs.tsv"
## [14] "vignette_borealis_patient_82_chr14_DMLs.tsv"
## [15] "vignette_borealis_patient_83_chr14_DMLs.tsv"
## [16] "vignette_borealis_patient_84_chr14_DMLs.tsv"
## [17] "vignette_borealis_patient_85_chr14_DMLs.tsv"
## [18] "vignette_borealis_patient_86_chr14_DMLs.tsv"
## [19] "vignette_borealis_patient_87_chr14_DMLs.tsv"
## [20] "vignette_borealis_patient_8_chr14_DMLs.tsv"
is really crying out for a GenomicFiles design, where you could introduce a colData component to bind patient-level characteristics to the file collection.
Thank you for this feedback. My colleagues and I have extensively re-written the vignette based on your suggestions and believe it is now up to Bioconductor standards. Please let me know if any further action is required.
Changes include:
Building CpG islands...
downloading 1 resources
retrieving 1 resource
loading from cache
Annotating...
Error running filter /home/stvjc/R-dev-dist/lib/R/library/bookdown/rmarkdown/lua/custom-environment.lua:
.../R/library/bookdown/rmarkdown/lua/custom-environment.lua:92: attempt to call a nil value (global 'print_debug')
stack traceback:
.../R/library/bookdown/rmarkdown/lua/custom-environment.lua:92: in function 'Div'
Error: processing vignette 'borealis.Rmd' failed with diagnostics:
pandoc document conversion failed with error 83
--- failed re-building ‘borealis.Rmd’
SUMMARY: processing the following file failed:
‘borealis.Rmd’
Error: Vignette re-building failed.
Execution halted
I passed it for review. But the above does not bode well for automated checking. The "Download" event is problematic - should the downloaded resources be part of AnnotationHub or ExperimentHub? See the guidelines for developers.
Thank you.
Follow up question on AnnotationHub: the above download is happening in our vignette line 346 in a call to the bioconductor annotatr function build_annotations
, which according to the documentation uses AnnotationHub, TxDb.* and org.db packages. To avoid the download we would skip this command and cache its result in a file in our extdata, but is there a way we can use this bioconductor package to illustrate annotations in our vignette? Seems as though the annotatr vignette uses build_annotations
just fine so I am unclear on why it is causing problems in our vignette but not theirs.
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/borealis
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 15fc30e62e9a71c45e540fa9bc45ef11042d6c16
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/borealis
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
So I believe I have corrected all issues I can at this time. The remaining warning is regarding the bioconductor package DSS:
Warning: replacing previous import matrixStats::rowMedians by Biobase::rowMedians when loading DSS
Warning: replacing previous import matrixStats::anyMissing by Biobase::anyMissing when loading DSS
As far as I can tell, this is an issue in the DSS package itself, as its check logs show warnings about scoping these function calls: https://bioconductor.org/checkResults/3.14/bioc-LATEST/DSS/nebbiolo2-checksrc.html
Thank you. I will try and review the package as soon as possible.
A few comments and questions below:
NEWS
.md
extention NEWS.md
inst
inst/scripts
directory that has a file that describes how
the data in inst/extdata
was generate. It can be code, sudo code, or text
but should minimally contain source information and give users and idea of
how to recreate the data if they so choose. man
[ ] Could you please add an alias or complete man page for the package. so
that if a user does ?borealis
that it will resolve
[ ] Instead of creating and then removing a file in the examples. Please use a
tempdir
to write output in examples. Then clean up is automatic
[ ] The results of runSingleNewSample should also be able to be printed as a GRanges object; the standard Bioconductor object for data representing genomic positions. It could be an optional flag to do the conversion?
[ ] The results of runBorealis was not a 'SeqCountSet' as advertised. It was a list and doesn't display nicely if you try to look at it.
vignette
[x] Just a note that BiocManager::install()
will also install github
references.
[ ] Please include the reference or link to the mentioned Borealis publication
in the section 3. Running Borealis
section.
[ ] Do not write output to any package installation directory. Not everyone
will have access. Please write output to a tempdir
[x] Thank you for your extensive use of GRanges! And extending out to annotatr.
[x] In response to your question: I am not having the annotationhub/caching issues so I think its okay unless you already changed something
R
Please comment on the above concerns and fix the minor issues mentioned above. Please let me know when the updates are pushed for a re-review
Cheers
Received a valid push on git.bioconductor.org; starting a build for commit id: a8a51bc07614c02bbca7bc25f9fb8c8082b7ea4f
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/borealis
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 91ec3645e7d262524d268b2ff55709c220b188f4
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/borealis
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Thank you very much for your thorough review and helpful comments! We agree with the comments and have addressed all bullet points exactly as suggested, except for the last one.
Regarding the last one, when considering BiSeq::readBismark
a large issue was that reading in all the data was the largest computational time bottleneck in our entire pipeline (particularly in WGBS or large cohorts). In particular, if you look at BiSeq's source code here:
https://github.com/astatham/BiSeq/blob/master/R/readBismark.R#L9
You will see that they essentially read in the raw data in a serial for-loop. Conversely, I have implemented this read operation in parallel:
https://github.com/GarrettJenkinson/borealis/blob/main/R/outlier.R#L50
which in practice was giving nearly linear speedups in the number of parallel cores provided (i.e., 8X faster runtime with 8 cores).
Additionally, the core/complicated "logic" of loadBismarkData
is actually handled by the bioconductor package DSS and its makeBSseqData
object. This produces a BSseq
object, which we now make the primary object that is passed around between functions in our package (rather than the list
from before).
However, we are happy to reconsider this point if you feel strongly that BiSeq's functionality would be preferable here.
Thank you again! Garrett
FWIW there's also bsseq::read.bismark()
, so it's not like you're the first person to reinvent this wheel :)
Thank you
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
Thank you so much! Really appreciate the reviewers' time to improve our package!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/GarrettJenkinson.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("borealis")
. The package 'landing page' will be created at
https://bioconductor.org/packages/borealis
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.