Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

(inactive) MMAPPR2 #902

Closed kjohnsen closed 5 years ago

kjohnsen commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @kjohnsen

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: MMAPPR2
Title: Mutation Mapping Analysis Pipeline for Pooled RNA-Seq
Version: 0.98.9
Authors@R: c(
    person("Kyle", "Johnsen", email="kjohnsen@byu.edu", role=c("aut")),
    person('Nathaniel', 'Jenkins', role=c('aut')),
    person('Jonathon', 'Hill', email='jhill@byu.edu', role=c('cre')))
Description: MMAPPR2 maps mutations resulting from pooled RNA-seq data from the F2
    cross of forward genetic screens. Its predecessor is described in a paper published
    in Genome Research (Hill et al. 2013). MMAPPR2 accepts aligned BAM files as well as
    a reference genome as input, identifies loci of high sequence disparity between the
    control and mutant RNA sequences, predicts variant effects using Ensembl's Variant
    Effect Predictor, and outputs a ranked list of candidate mutations.
Depends: R (>= 3.5.0)
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Suggests: testthat,
    mockery,
    roxygen2
Imports: ensemblVEP (>= 1.20.0),
    gmapR,
    Rsamtools,
    VariantAnnotation,
    BiocParallel,
    Biobase,
    BiocGenerics,
    dplyr,
    GenomeInfoDb,
    GenomicRanges,
    IRanges,
    S4Vectors,
    tidyr,
    VariantTools,
    magrittr,
    methods
SystemRequirements: Ensembl VEP
biocViews: RNASeq,
    PooledScreens,
    DNASeq,
    VariantDetection
URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3613585/,
    https://github.com/kjohnsen/MMAPPR2
BugReports: https://github.com/kjohnsen/MMAPPR2/issues
OS_type: unix
kjohnsen commented 5 years ago

AdditionalPackage: https://github.com/kjohnsen/MMAPPR2data

bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 5 years ago

Thank you for your submission to Bioconductor. Please see the technical review below. Please also ensure you have your webhook enabled for both packages to ensure that valid version bumps kick off new builds. Please comment back here with updates on changes and when you are ready for a re-review.

MMAPPR2data

description:

namespace:

man:

report example error

MMAPPR2

build report

description

vignette

Once the above is corrected I will look more closely at man files and R code. Thank you.I look forward to working with you to get your packages accepted to Bioconductor.

Cheers!

kjohnsen commented 5 years ago

Thanks Lori! I'll get to work on fixing these. As for the data--the data included is pre-computed MmapprData objects at various points in the pipeline--it seemed unwise to me to put these in the data package, since that would introduce a circular dependency. I need these large objects for runnable examples--they are already the minimum amount of data to show meaningful functionality. I could strip them down to the point of having trivial examples though. What do you recommend?

lshep commented 5 years ago

Do you introduce a new class? If not, I'm unsure why you would need a circular dependency? If your data is in standard formats which it looks like .rda and doesn't require a specialized class from your package to properly display or load you shouldn't need to depend, import, or suggest the software package.

Another thought too: I can only assume as you say they are various points in the pipeline that running interactively is too time consuming?

kjohnsen commented 5 years ago

Yes, I introduce a new class to store data as the pipeline progresses.

And running interactively is fairly time-consuming--it takes a couple of minutes to get through the whole thing with this trimmed demo data.

Also, no, this package isn't meant to be run on Windows.

And I thought I was subscribed to the bioc-devel list. I've been receiving emails from it for months now, and I couldn't find any missed verification email in my spam folder.

lshep commented 5 years ago

May I inquire further into what specifically doesn't run on windows? There is .BBSoptions, which can be placed in the root of your package but you limited yourself to probably about 1/2 of your potential users by making it unavailable on this platform. Maybe only certain functions are windows dependent and this could be checked?
Depending on if you can rethink your code and make it available on windows - you will need to add the .BBSoptions file to the top level of your package - if you need to move forward with this option I will forward the necessary commands to include.

We prefer examples and pipelines run interactively - and a couple of minutes can be acceptable time. By a couple of minutes are we talking 2min, 5min, 10min? I would really like to see the pipeline run from scratch to see how long it will take. As long as all of the functions are run once (ie. if the entire pipeline and functions are run interactively in the vignette, I tend to be more lenient about dontruns in examples or vice versa). Then the data would be able to be removed if it is a reasonable amount of time.
Depending on that time limit - We would still recommend moving the data into the data experiment package provided as this is the point of such an accompanied package.

The maintainer's email is checked - current in your description file the maintainer is designated as person('Jonathon', 'Hill', email='jhill@byu.edu', role=c('cre') - cre is maintainer designation - is this the email that is receiving bioc-devel mailing list?

kjohnsen commented 5 years ago

The package doesn't run on Windows because gmapR is a dependency (we use GmapGenome for variant calling with VariantTools). That's not the only kind of genome you can use with VariantTools, so we could fix that, but that would be a longer-term priority. So yes, for now we'd like to mark this package as not for Windows.

The pipeline would probably finish under 5 minutes, but there's a problem. One step of the pipeline (variant calling and effect prediction) has some heavy dependencies, so I thought it would be better not to run it at all in the vignette and examples. It would require access to reference genome files, constructing a GmapGenome, and installing VEP (which itself can introduce a nightmare of Perl dependencies).

So assuming we don't want to run that part of the pipeline, I need some way to store the results of variant calling and effect prediction to pick up with the rest of the pipeline. I could cut the number of those objects from 5 down to just the one I need to skip that step--the package might make the size requirement then.

I can see why it would be preferable to have that data in the MMAPPR2data package, but how could I store it there when it's an instance of a class defined in MMAPPR2?

That's right, my PI is listed as the maintainer--I will make sure he gets on the bioc-devel list.

kjohnsen commented 5 years ago

Or here's an idea: MmapprData stores variant information in a GRanges object, so what if I store just that GRanges object in MMAPPR2data, thus eliminating the circular dependency problem? Then I can tack on that data and continue the pipeline.

So if I understood right, if I can get that all working in the vignette, I can put dontrun on the examples of those intermediate step functions? Those manual examples would be the toughest thing to do if we get rid of the data.

lshep commented 5 years ago

Yes - as long as the ability to run the examples in the man pages is there for users (so feasibly the code in the dontrun works - I would let it slide provided all the steps and full pipeline is run in the vignette - I would move forward with the proposal above.
In the top most directory please create a file .BBSoptions - in that file please include the line UnsupportedPlatforms: win . This will signal to both the SPB and when you get on the daily builder that it is not supported on this platform.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

c529d38 Create vignette (#62) * Update DESCRIPTION

kjohnsen commented 5 years ago

All right, I implemented the restructured data/vignette/examples and it should all be working. I'll be doing the minor fixes and adding the .BBSoptions file soon.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

kjohnsen commented 5 years ago

It looks like the data package hasn't been getting built, which is necessary for the vignette to build correctly. I added the webhook and it got built the first time, so I don't know why it hasn't been working for recent version bumps

lshep commented 5 years ago

Can you please try doing the AdditionalPackage: https://github.com/kjohnsen/MMAPPR2data again? The issue is that this line is normally done after the first package is completed the checks and after the package is out of awaiting review so it never registered in the queue.

kjohnsen commented 5 years ago

AdditionalPackage: https://github.com/kjohnsen/MMAPPR2data

bioc-issue-bot commented 5 years ago

Hi @kjohnsen,

Starting build on additional package https://github.com/kjohnsen/MMAPPR2data.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your additional package repository will NOT trigger a new build.

The DESCRIPTION file of this additional package is:

Package: MMAPPR2data
Type: Package
Title: Sample Data for MMAPPR2
Version: 0.99.4
Authors@R: c(
person("Kyle", "Johnsen", email="kjohnsen@byu.edu", role=c("aut")),
person("Jonathon", "Hill", email="jhill@byu.edu", role=c("cre")))
Description: Contains sample data for illustration purposes in the MMAPPR2
package, namely BAM files containing RNA-Seq data for the wild-type
and mutant pools of the zy13 mutation, as described in Hill et al. (2013).
VignetteBuilder: knitr
Enhances: MMAPPR2
Suggests: knitr,
rmarkdown,
BiocStyle,
roxygen2
Imports: ExperimentHub (>= 1.7.6),
Rsamtools,
GenomicRanges
License: GPL-3
Encoding: UTF-8
biocViews: RNASeqData,
ExperimentData,
SequencingData,
ExperimentHub
URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3613585/,
https://b2b.hci.utah.edu/gnomex/gnomexGuestFlex.jsp?topicNumber=27,
https://github.com/kjohnsen/MMAPPR2
RoxygenNote: 6.1.0
bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

bfa2a02 Add newline at end of test_main.R 455d5c8 Add EOF newline to test_peaks.R 5e2232a Replace T with TRUE in aicc_loess.R Change T to T... 9adfe84 Use is() instead of class()== 975ee92 Add dependencies c3b346d Add runnable examples, change to MmapprData-getter... e040f5b Remove MD usage in MmapprData-getters example e64e5c7 Update doc page as well bfe8a5a Change F to FALSE in output.R 77af852 Change sapply to vapply 1948eec Fix MmapprData object in MmapprData-getters exampl... c9310f2 Update docs for previous commit f05c125 Use seq_len/seq_along instead of 1:x 94ca551 Replace system() with system2() 32faa09 Bump to version 0.99.0 54998cc Add .BBSoptions to avoid Windows build 240e172 Some BiocCheck changes (#95) * Add newline at end...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

kjohnsen commented 5 years ago

Ok, I've got a couple of style notes left, but I have some questions on a few things:

lshep commented 5 years ago

We do not have a Jonathan Hill or the listed email registered on the bioc-devel mailing list. I just checked the membership listing. Please have him register with the jhill@byu.edu again. Maybe he inadvertently unsubscribed.

You can ignore the @ note then

No you don't have to fix the line spacing.

The time constraint is close enough that I will be lenient on the 5 min warning.

Data Package Please fix the other ERROR and WARNING in the MMAPPR2data package build report as well. Most notably about the import and NAMESPACE

Main Package To get rid of the 80% warnings man documentation error and also for completion - please put all necessary code to run the example (in the dontrun if necessary) but right now for example in generateCandidates.Rd I don't know what postPeakRefMD is in the example given. Show the steps to make this object even if it isn't run - and if there is a single line that could be run place that outside the don't run - even it it is identifying the data file

Please start with these comment - I will look over the package R code later today and comment with anything further.

kjohnsen commented 5 years ago

I know you advised not importing AnnotationHub in the data package, but my functions rely on the hubCache() function, so I put it back in the imports of the data package.

kjohnsen commented 5 years ago

I made all the examples runnable and did some style fixes. I didn't shorten all of the lines over 80 characters, mainly for verbose, uninteresting lines like logging and setting generics. Let me know if you want me to clean those up more.

There might be a problem with GitHub at the moment--I'm not getting builds triggered either on Travis or through here.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

27de8b2 Add R 3.5 dependency 28982fc Fix Windows doc error cdee026 Fix import stuff 1736ba0 Windows doc error docs update 0d7c82e Replace system() with system2() 4796d4b Bump to version 0.99.5

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

a460bc6 Runnable examples (#96) * Give context to step ex...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

4f3ef76 Shorten line lengths, remove Sys.which() debug lin... 3bff421 Bump to version 0.99.2

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2249ced Add newline to make-metadata.R a8e3771 Fix manual link

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

560951e Remove docs link to MMAPPR2 4293a64 Bump to v0.99.7

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 5 years ago

Data package is good to go now. Please see the following comments regarding the software package:

Running the Vignette:

{r hidden-hacks, echo=FALSE}
# hide from users, allows VEPFlags to be constructed without complaining
MMAPPR2:::.insertFakeVEPintoPath()
gmapR::GmapGenomeDirectory(create=TRUE)
BiocParallel::register(BiocParallel::SerialParam())
{r candidates-step, eval=FALSE}
postCandidatesMD <- generateCandidates(postPeakRefMD)
{r prep-for-output, echo=FALSE}
postCandidatesMD <- postPeakRefMD
## I don't know why data isn't being exported right and I have to do this...
data("zy13candidates")
postCandidatesMD@candidates <- zy13candidates # tack on from MMAPPR2data
dir.create(outputFolder(param(postCandidatesMD)), showWarnings = FALSE)
outputMmapprData(postCandidatesMD)

MMAPPR2 R code all_classes.R

main.R

peaks.R and other R files

As before please comment back here with updates and comments and when ready kick off a new build of the package for rereview

kjohnsen commented 5 years ago

Ok, thank you for the comments!

I can work on the smaller stuff, but most of what you mentioned comes down to this: should generateCandidates() be runnable in the vignette? I assure you it works, but like I mentioned before, I thought its dependencies were just too much to want to try and install them in the vignette. Building a GmapGenome takes a long time, and would require access to a large fasta file. Installing VEP is much more complicated still, introducing a world of perl dependencies. These are both necessary for generateCandidates(), which calls variants then runs VEP on them, so what do you suggest?

(.insertFakeVEPintoPath is only necessary for examples when VEP isn't installed; without it, a VEPFlags object can't be created. The user would not use this.)

lshep commented 5 years ago

(.insertFakeVEPintoPath is only necessary for examples when VEP isn't installed; without it, a VEPFlags object can't be created. The user would not use this.) - If you are making VEP a requirement of your package which you do, then it is expected that the user will have this installed to run the vignette. The builders have this installed already. so remove the helper file that would place misc. dummy paths on a users system environment. writing something to a users directory other than a temporary directory or changing environment variables is really never a good idea and we would always discourage this.

There needs to be a way to run and test your functions in a timely and efficient manner. Even if this means using a more paired down and dummy example set. You "assuring" me that it works isn't very assuring when I try to run the code provided and get an error. At the very least the code that is shown should be able to be run and not ERROR (even if it is currently eval=FALSE) so that I could replicated it if I so choose to take the time. However, any function that is exported should have some way to test or run the function to ensure that it is running correctly. Nor was it very confident to have a comment like ## I don't know why data isn't being exported right and I have to do this...

@mtmorgan would you agree/disagree or any further comments?

kjohnsen commented 5 years ago

Okay, if I can count on VEP being installed, I think all we'll need to run generateCandidates() in the vignette is the reference genome. My first idea to implement this would be to add a fasta file with just the chromosome we're interested in (73 MB) to the data package. What do you think? I assume there's a more efficient, R-based way to store a genome.

Do you mean the vignette should be able to build on the user's machine? If we assume VEP is already installed, should the code blocks instructing the user on how to install it remain as eval=FALSE?

I created the vignette assuming that its primary purpose was to teach the user how to use the package; I tried to arrange it so that a user could follow it from installation of dependencies through running the program seamlessly, despite the ugly stuff going on in the echo=FALSE blocks (which could definitely be refactored, true). That is, my idea was to use these echo=FALSE blocks to maintain the illusion of the dependencies (VEP and genome) being set up already, which should be true if the user follows the vignette. The built vignette does not display these blocks.

mtmorgan commented 5 years ago

Code blocks with installation instructions may legitimately have an eval = FALSE flag.

Definitely .insertFakeVEPintoPath() should not be used (even if it were, one would rather dir <- tempfile(); dir.create(dir) instead of unlinking a directory...). because as Lori mentions the user (and build system) will have VEP installed...

Is it really the case that one needs a full chromosome to run the vignette / examples? I realize there's a tension between demonstrating the software and performing an analysis, but a carefully chosen example may be sufficient to illustrate functionality with very modest resources.

Certainly I can believe that generateCandidates() works now, but that is no guarantee that it will be robust to changes in R or dependent packages in the future. Any test at all would be better than none, and again this test needs to be balanced against the need to build and test your package in a timely manner.

kjohnsen commented 5 years ago

The sample data covers most of the chromosome, but not all, so a reference genome could be cut down slightly. In preparing the data, I could only narrow the range so much before functionality started suffering.

The issue basically is this--the reads and genome need to cover a wide enough range to detect linkage between two pools of a genetic screen. If we're using real data, I might be able to find another case where we could cut it down to half a chromosome, but not much more.

The only other way I see would be to simulate the aligned sequencing data. I haven't done that before, so I don't know how feasible that is or how realistic the results could be.

Any suggestions?

lshep commented 5 years ago

Hello @kjohnsen , I wanted to check in if there has been any progress?

kjohnsen commented 5 years ago

No, not too much recently--since missing the deadline for the October release I have had other priorities. I should be back to focusing on MMAPPR in a couple of weeks.

lshep commented 5 years ago

I will close this issue for now until you have the time to work on it. When ready please tag me on the issue here and I will reopen.

bioc-issue-bot commented 5 years ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for interest in Bioconductor.

lshep commented 5 years ago

Please also note that the data in the hubs has been updated to work with the current 3.9 biocversion - please update your metadata in the data package to reflect biocversion 3.9 and use the devel version of bioconductor to have access to these values in the hub.

kjohnsen commented 5 years ago

Hi @lshep, I'm getting back to work on this. I'm looking at how to test generateCandidates(), and since the check is already taking over 5 minutes without that part, I figure I'm left with no choice but to come up with a tiny toy dataset and reference genome. Let me know if I'm wrong.

lshep commented 5 years ago

That seems reasonable. Tiny toy sets often display the functionality of the package well enough and you can always reference the fuller set if user want a more thorough experience.