Closed kjohnsen closed 5 years ago
Hi @kjohnsen
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: MMAPPR2
Title: Mutation Mapping Analysis Pipeline for Pooled RNA-Seq
Version: 0.98.9
Authors@R: c(
person("Kyle", "Johnsen", email="kjohnsen@byu.edu", role=c("aut")),
person('Nathaniel', 'Jenkins', role=c('aut')),
person('Jonathon', 'Hill', email='jhill@byu.edu', role=c('cre')))
Description: MMAPPR2 maps mutations resulting from pooled RNA-seq data from the F2
cross of forward genetic screens. Its predecessor is described in a paper published
in Genome Research (Hill et al. 2013). MMAPPR2 accepts aligned BAM files as well as
a reference genome as input, identifies loci of high sequence disparity between the
control and mutant RNA sequences, predicts variant effects using Ensembl's Variant
Effect Predictor, and outputs a ranked list of candidate mutations.
Depends: R (>= 3.5.0)
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Suggests: testthat,
mockery,
roxygen2
Imports: ensemblVEP (>= 1.20.0),
gmapR,
Rsamtools,
VariantAnnotation,
BiocParallel,
Biobase,
BiocGenerics,
dplyr,
GenomeInfoDb,
GenomicRanges,
IRanges,
S4Vectors,
tidyr,
VariantTools,
magrittr,
methods
SystemRequirements: Ensembl VEP
biocViews: RNASeq,
PooledScreens,
DNASeq,
VariantDetection
URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3613585/,
https://github.com/kjohnsen/MMAPPR2
BugReports: https://github.com/kjohnsen/MMAPPR2/issues
OS_type: unix
AdditionalPackage: https://github.com/kjohnsen/MMAPPR2data
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Thank you for your submission to Bioconductor. Please see the technical review below. Please also ensure you have your webhook enabled for both packages to ensure that valid version bumps kick off new builds. Please comment back here with updates on changes and when you are ready for a re-review.
MMAPPR2data
description:
namespace:
import(ExperimentHub)
import(Rsamtools)
man:
\code{\link[Rsamtools:BamFile-class]{BamFile}}
Similar situation as https://stat.ethz.ch/pipermail/bioc-devel/2018-October/014137.html
report example error
https://stat.ethz.ch/mailman/listinfo/bioc-devel
If you have done so already
there is normally a verification step so please check spam or junk folder in
case it was redirected there. MMAPPR2
build report
https://stat.ethz.ch/mailman/listinfo/bioc-devel
If you have done so already
there is normally a verification step so please check spam or junk folder in
case it was redirected there. description
vignette
Once the above is corrected I will look more closely at man files and R code. Thank you.I look forward to working with you to get your packages accepted to Bioconductor.
Cheers!
Thanks Lori! I'll get to work on fixing these. As for the data--the data included is pre-computed MmapprData
objects at various points in the pipeline--it seemed unwise to me to put these in the data package, since that would introduce a circular dependency. I need these large objects for runnable examples--they are already the minimum amount of data to show meaningful functionality. I could strip them down to the point of having trivial examples though. What do you recommend?
Do you introduce a new class? If not, I'm unsure why you would need a circular dependency? If your data is in standard formats which it looks like .rda and doesn't require a specialized class from your package to properly display or load you shouldn't need to depend, import, or suggest the software package.
Another thought too: I can only assume as you say they are various points in the pipeline that running interactively is too time consuming?
Yes, I introduce a new class to store data as the pipeline progresses.
And running interactively is fairly time-consuming--it takes a couple of minutes to get through the whole thing with this trimmed demo data.
Also, no, this package isn't meant to be run on Windows.
And I thought I was subscribed to the bioc-devel list. I've been receiving emails from it for months now, and I couldn't find any missed verification email in my spam folder.
May I inquire further into what specifically doesn't run on windows? There is .BBSoptions, which can be placed in the root of your package but you limited yourself to probably about 1/2 of your potential users by making it unavailable on this platform. Maybe only certain functions are windows dependent and this could be checked?
Depending on if you can rethink your code and make it available on windows - you will need to add the .BBSoptions file to the top level of your package - if you need to move forward with this option I will forward the necessary commands to include.
We prefer examples and pipelines run interactively - and a couple of minutes can be acceptable time. By a couple of minutes are we talking 2min, 5min, 10min? I would really like to see the pipeline run from scratch to see how long it will take. As long as all of the functions are run once (ie. if the entire pipeline and functions are run interactively in the vignette, I tend to be more lenient about dontruns in examples or vice versa). Then the data would be able to be removed if it is a reasonable amount of time.
Depending on that time limit - We would still recommend moving the data into the data experiment package provided as this is the point of such an accompanied package.
The maintainer's email is checked - current in your description file the maintainer is designated as person('Jonathon', 'Hill', email='jhill@byu.edu', role=c('cre') - cre is maintainer designation - is this the email that is receiving bioc-devel mailing list?
The package doesn't run on Windows because gmapR is a dependency (we use GmapGenome for variant calling with VariantTools). That's not the only kind of genome you can use with VariantTools, so we could fix that, but that would be a longer-term priority. So yes, for now we'd like to mark this package as not for Windows.
The pipeline would probably finish under 5 minutes, but there's a problem. One step of the pipeline (variant calling and effect prediction) has some heavy dependencies, so I thought it would be better not to run it at all in the vignette and examples. It would require access to reference genome files, constructing a GmapGenome, and installing VEP (which itself can introduce a nightmare of Perl dependencies).
So assuming we don't want to run that part of the pipeline, I need some way to store the results of variant calling and effect prediction to pick up with the rest of the pipeline. I could cut the number of those objects from 5 down to just the one I need to skip that step--the package might make the size requirement then.
I can see why it would be preferable to have that data in the MMAPPR2data package, but how could I store it there when it's an instance of a class defined in MMAPPR2?
That's right, my PI is listed as the maintainer--I will make sure he gets on the bioc-devel list.
Or here's an idea: MmapprData stores variant information in a GRanges object, so what if I store just that GRanges object in MMAPPR2data, thus eliminating the circular dependency problem? Then I can tack on that data and continue the pipeline.
So if I understood right, if I can get that all working in the vignette, I can put dontrun on the examples of those intermediate step functions? Those manual examples would be the toughest thing to do if we get rid of the data.
Yes - as long as the ability to run the examples in the man pages is there for users (so feasibly the code in the dontrun works - I would let it slide provided all the steps and full pipeline is run in the vignette - I would move forward with the proposal above.
In the top most directory please create a file .BBSoptions
- in that file please include the line UnsupportedPlatforms: win
. This will signal to both the SPB and when you get on the daily builder that it is not supported on this platform.
Received a valid push; starting a build. Commits are:
c529d38 Create vignette (#62) * Update DESCRIPTION
All right, I implemented the restructured data/vignette/examples and it should all be working. I'll be doing the minor fixes and adding the .BBSoptions file soon.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
It looks like the data package hasn't been getting built, which is necessary for the vignette to build correctly. I added the webhook and it got built the first time, so I don't know why it hasn't been working for recent version bumps
Can you please try doing the AdditionalPackage: https://github.com/kjohnsen/MMAPPR2data
again? The issue is that this line is normally done after the first package is completed the checks and after the package is out of awaiting review so it never registered in the queue.
AdditionalPackage: https://github.com/kjohnsen/MMAPPR2data
Hi @kjohnsen,
Starting build on additional package https://github.com/kjohnsen/MMAPPR2data.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your additional package repository will NOT trigger a new build.
The DESCRIPTION file of this additional package is:
Package: MMAPPR2data
Type: Package
Title: Sample Data for MMAPPR2
Version: 0.99.4
Authors@R: c(
person("Kyle", "Johnsen", email="kjohnsen@byu.edu", role=c("aut")),
person("Jonathon", "Hill", email="jhill@byu.edu", role=c("cre")))
Description: Contains sample data for illustration purposes in the MMAPPR2
package, namely BAM files containing RNA-Seq data for the wild-type
and mutant pools of the zy13 mutation, as described in Hill et al. (2013).
VignetteBuilder: knitr
Enhances: MMAPPR2
Suggests: knitr,
rmarkdown,
BiocStyle,
roxygen2
Imports: ExperimentHub (>= 1.7.6),
Rsamtools,
GenomicRanges
License: GPL-3
Encoding: UTF-8
biocViews: RNASeqData,
ExperimentData,
SequencingData,
ExperimentHub
URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3613585/,
https://b2b.hci.utah.edu/gnomex/gnomexGuestFlex.jsp?topicNumber=27,
https://github.com/kjohnsen/MMAPPR2
RoxygenNote: 6.1.0
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
bfa2a02 Add newline at end of test_main.R 455d5c8 Add EOF newline to test_peaks.R 5e2232a Replace T with TRUE in aicc_loess.R Change T to T... 9adfe84 Use is() instead of class()== 975ee92 Add dependencies c3b346d Add runnable examples, change to MmapprData-getter... e040f5b Remove MD usage in MmapprData-getters example e64e5c7 Update doc page as well bfe8a5a Change F to FALSE in output.R 77af852 Change sapply to vapply 1948eec Fix MmapprData object in MmapprData-getters exampl... c9310f2 Update docs for previous commit f05c125 Use seq_len/seq_along instead of 1:x 94ca551 Replace system() with system2() 32faa09 Bump to version 0.99.0 54998cc Add .BBSoptions to avoid Windows build 240e172 Some BiocCheck changes (#95) * Add newline at end...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Ok, I've got a couple of style notes left, but I have some questions on a few things:
@
instead of accessors in vignettes: it's in code not visible in the finished vignette, and I didn't make that setter on purpose, because the user shouldn't have to call it.We do not have a Jonathan Hill or the listed email registered on the bioc-devel mailing list. I just checked the membership listing. Please have him register with the jhill@byu.edu again. Maybe he inadvertently unsubscribed.
You can ignore the @ note then
No you don't have to fix the line spacing.
The time constraint is close enough that I will be lenient on the 5 min warning.
Data Package Please fix the other ERROR and WARNING in the MMAPPR2data package build report as well. Most notably about the import and NAMESPACE
Main Package To get rid of the 80% warnings man documentation error and also for completion - please put all necessary code to run the example (in the dontrun if necessary) but right now for example in generateCandidates.Rd I don't know what postPeakRefMD is in the example given. Show the steps to make this object even if it isn't run - and if there is a single line that could be run place that outside the don't run - even it it is identifying the data file
Please start with these comment - I will look over the package R code later today and comment with anything further.
I know you advised not importing AnnotationHub in the data package, but my functions rely on the hubCache()
function, so I put it back in the imports of the data package.
I made all the examples runnable and did some style fixes. I didn't shorten all of the lines over 80 characters, mainly for verbose, uninteresting lines like logging and setting generics. Let me know if you want me to clean those up more.
There might be a problem with GitHub at the moment--I'm not getting builds triggered either on Travis or through here.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
a460bc6 Runnable examples (#96) * Give context to step ex...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Data package is good to go now. Please see the following comments regarding the software package:
Running the Vignette:
{r hidden-hacks, echo=FALSE}
# hide from users, allows VEPFlags to be constructed without complaining
MMAPPR2:::.insertFakeVEPintoPath()
gmapR::GmapGenomeDirectory(create=TRUE)
BiocParallel::register(BiocParallel::SerialParam())
{r candidates-step, eval=FALSE}
postCandidatesMD <- generateCandidates(postPeakRefMD)
{r prep-for-output, echo=FALSE}
postCandidatesMD <- postPeakRefMD
## I don't know why data isn't being exported right and I have to do this...
data("zy13candidates")
postCandidatesMD@candidates <- zy13candidates # tack on from MMAPPR2data
dir.create(outputFolder(param(postCandidatesMD)), showWarnings = FALSE)
outputMmapprData(postCandidatesMD)
## I don't know why data isn't being exported right and I have to do this
Its better to fix it. outputMmapprData
function. rather than relying on the user to create it
> outputMmapprData(postCandidatesMD)
Error in pdf(file.path(mmapprData@param@outputFolder, "genome_plots.pdf"), :
cannot open file 'DEFAULT/genome_plots.pdf'
MMAPPR2 R code all_classes.R
main.R
[ ] .insertFakeVEPintoPath - is this necessary - The builders will have VEP installed and I think its reasonable to assume if you say at the top that the vignette requires ensembl vep to be installed and paths to be set that this function should not be necessary.
[ ] this seems like it could be reduced to an apply and/or parallelized bpiterate. If you already implemented BiocParallel it would probably be a good candidate to also parallelize.
.indexBamFileList <- function(bfl) {
for (i in seq_along(bfl)) {
bamFile <- bfl[[i]]
if (is.na(Rsamtools::index(bamFile)))
Rsamtools::indexBam(bamFile)
}
}
peaks.R and other R files
distance(mmapprData)[[i]]
rather than mmapprData@distance[[i]]
Use getters and setters; thats why they are defined!As before please comment back here with updates and comments and when ready kick off a new build of the package for rereview
Ok, thank you for the comments!
I can work on the smaller stuff, but most of what you mentioned comes down to this: should generateCandidates()
be runnable in the vignette? I assure you it works, but like I mentioned before, I thought its dependencies were just too much to want to try and install them in the vignette. Building a GmapGenome
takes a long time, and would require access to a large fasta file. Installing VEP is much more complicated still, introducing a world of perl dependencies. These are both necessary for generateCandidates()
, which calls variants then runs VEP on them, so what do you suggest?
(.insertFakeVEPintoPath
is only necessary for examples when VEP isn't installed; without it, a VEPFlags object can't be created. The user would not use this.)
(.insertFakeVEPintoPath is only necessary for examples when VEP isn't installed; without it, a VEPFlags object can't be created. The user would not use this.) -
If you are making VEP a requirement of your package which you do, then it is expected that the user will have this installed to run the vignette. The builders have this installed already. so remove the helper file that would place misc. dummy paths on a users system environment. writing something to a users directory other than a temporary directory or changing environment variables is really never a good idea and we would always discourage this.
There needs to be a way to run and test your functions in a timely and efficient manner. Even if this means using a more paired down and dummy example set. You "assuring" me that it works isn't very assuring when I try to run the code provided and get an error. At the very least the code that is shown should be able to be run and not ERROR (even if it is currently eval=FALSE) so that I could replicated it if I so choose to take the time. However, any function that is exported should have some way to test or run the function to ensure that it is running correctly. Nor was it very confident to have a comment like ## I don't know why data isn't being exported right and I have to do this...
@mtmorgan would you agree/disagree or any further comments?
Okay, if I can count on VEP being installed, I think all we'll need to run generateCandidates()
in the vignette is the reference genome. My first idea to implement this would be to add a fasta file with just the chromosome we're interested in (73 MB) to the data package. What do you think? I assume there's a more efficient, R-based way to store a genome.
Do you mean the vignette should be able to build on the user's machine? If we assume VEP is already installed, should the code blocks instructing the user on how to install it remain as eval=FALSE
?
I created the vignette assuming that its primary purpose was to teach the user how to use the package; I tried to arrange it so that a user could follow it from installation of dependencies through running the program seamlessly, despite the ugly stuff going on in the echo=FALSE
blocks (which could definitely be refactored, true). That is, my idea was to use these echo=FALSE
blocks to maintain the illusion of the dependencies (VEP and genome) being set up already, which should be true if the user follows the vignette. The built vignette does not display these blocks.
Code blocks with installation instructions may legitimately have an eval = FALSE
flag.
Definitely .insertFakeVEPintoPath()
should not be used (even if it were, one would rather dir <- tempfile(); dir.create(dir)
instead of unlinking a directory...). because as Lori mentions the user (and build system) will have VEP installed...
Is it really the case that one needs a full chromosome to run the vignette / examples? I realize there's a tension between demonstrating the software and performing an analysis, but a carefully chosen example may be sufficient to illustrate functionality with very modest resources.
Certainly I can believe that generateCandidates()
works now, but that is no guarantee that it will be robust to changes in R or dependent packages in the future. Any test at all would be better than none, and again this test needs to be balanced against the need to build and test your package in a timely manner.
The sample data covers most of the chromosome, but not all, so a reference genome could be cut down slightly. In preparing the data, I could only narrow the range so much before functionality started suffering.
The issue basically is this--the reads and genome need to cover a wide enough range to detect linkage between two pools of a genetic screen. If we're using real data, I might be able to find another case where we could cut it down to half a chromosome, but not much more.
The only other way I see would be to simulate the aligned sequencing data. I haven't done that before, so I don't know how feasible that is or how realistic the results could be.
Any suggestions?
Hello @kjohnsen , I wanted to check in if there has been any progress?
No, not too much recently--since missing the deadline for the October release I have had other priorities. I should be back to focusing on MMAPPR in a couple of weeks.
I will close this issue for now until you have the time to work on it. When ready please tag me on the issue here and I will reopen.
This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.
Thank you for interest in Bioconductor.
Please also note that the data in the hubs has been updated to work with the current 3.9 biocversion - please update your metadata in the data package to reflect biocversion 3.9 and use the devel version of bioconductor to have access to these values in the hub.
Hi @lshep, I'm getting back to work on this. I'm looking at how to test generateCandidates()
, and since the check is already taking over 5 minutes without that part, I figure I'm left with no choice but to come up with a tiny toy dataset and reference genome. Let me know if I'm wrong.
That seems reasonable. Tiny toy sets often display the functionality of the package well enough and you can always reference the fuller set if user want a more thorough experience.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.