Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

recount #9

Closed lcolladotor closed 8 years ago

lcolladotor commented 8 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/leekgroup/recount

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor,
the package source and all review commentary are visible to the
general public.

[x] I have read the Bioconductor Package Submission
instructions. My package is consistent with the Bioconductor
Package Guidelines.

[x] My package addresses statistical or bioinformatic issues related
to the analysis and comprehension of high throughput genomic data.

[x] I am committed to the long-term maintenance of my package. This
includes monitoring the support site for issues that users may
have, subscribing to the bioc-devel mailing list to stay aware
of developments in the Bioconductor community, responding promptly
to requests for updates from the Core team in response to changes in
R or underlying software.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Subversion (optionally via GitHub).

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/recount_buildreport_20160618020132.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

cfb3093 v0.99.20 -- Use mode = 'wb' as recommended by Mart...

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/recount_buildreport_20160618171215.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

cdba735 v0.99.21 -- address a BiocCheck RECOMMENDED by add...

lcolladotor commented 8 years ago

Hi @LiNk-NY and @vobencha,

Thanks to Martin Morgan, the last error is no longer present as shown at http://bioconductor.org/spb_reports/recount_buildreport_20160618171215.html. More information at https://github.com/leekgroup/recount/commit/cfb3093b331e44a8c1ddbd3f7c578900534fc202

On all OS, there is 1 warning and 2 notes. The 1 warning is the one I reported earlier at https://stat.ethz.ch/pipermail/bioc-devel/2016-May/009230.html and that Martin said not to worry about since it's an R issue.

The 2 notes are inevitable given what the package does and the vignette documentation we want to show.

BiocCheck throws 1 RECOMMENDED and 3 CONSIDER actions. https://github.com/leekgroup/recount/commit/cdba735c1dca496d86ef6882a308db18e0418401 should address the RECOMMENDED action, and the 3 CONSIDER ones are neglible.

I hope that we can now move forward in the package review process. You probably noticed that several of my recent commits are not related to the package review process, but simply related to the package development and request from its beta testers.

Best, Leo

PS Thanks Dan (or whoever did it) for upgrading the package build system to use Bioc-devel!

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/recount_buildreport_20160618181846.html

LiNk-NY commented 8 years ago

Hi Leonardo @lcolladotor, Thank you for submitting your package to Bioconductor. We have taken a look at recount and there are a few points that we wanted to go over.

From Valerie @vobencha, Is there any interest in making these resources available in ExperimentHub? The recount package could be the front end to accessing and working with the data but the data itself would live in AWS S3 buckets. Maybe this isn't the path you'd like to take - just wanted to suggest it. For examples of software packages that house data in ExperimentHub see:

GSE62944 curatedMetagenomicData

We have seen the changes you have made to make the package build on all machines and it is looking good.

Second, you have many unevaluted chunks in your vignettes. Why so many? Is there a time issue or are they just code examples that aren't critical to evaluate? Does this need to be fixed? Ideally, we would want all of the chunks to be evaluated although this may not be possible in all situations.

For the data that you are using in the vignettes, they are not exported but they are used in the examples. Please export and add \example sections to the man pages.

In test-data.R, do you need exclude the Expressed regions test for windows? Is this due to some functionality from rtracklayer that won't work on windows? On a related note, you're avoiding evaluation for some operations in coverage_matrix() and expressed_regions() when on windows. Is this due to the same issue or something else?

Valerie adds, BiocParallel will create the appropriate BPPARAM for any OS - so if on windows the workers are socket clusters, if on linux/mac they are forks. Eventually, the windows flag will need to be removed.

Perhaps the issue might be originating from rtracklayer as mentioned earlier?

There are a few \dontrun statements in the documentation. This is generally discouraged and users should be provided runnable examples.

~/Documents/Bioconductor/recount/man$ grep dontrun *
expressed_regions.Rd:\dontrun{
reproduce_ranges.Rd:\dontrun{
scale_counts.Rd:\dontrun{

Lastly, there is an issue with DESCRIPTION and NAMESPACE. As far as I know, many of the external functions that you use are a must for the package and therefore, should be included in the Imports field rather than Suggests in the DESCRIPTION file. This includes many packages like BiocParallel as Valerie suggests (including the output below).

* Namespace import suggestions are:
Generated by codetoolsBioC version 0.1-0
Timestamp: Wed Jun 15 09:10:38 2016

Imports: AnnotationDbi, BiocParallel, derfinder, GenomeInfoDb,
         GenomicFeatures, GenomicRanges, GEOquery, IRanges, methods,
         org.Hs.eg.db, recount, S4Vectors, stats,
         SummarizedExperiment, TxDb.Hsapiens.UCSC.hg38.knownGene,
         utils, XML

importFrom(methods, is)
...

If you have any questions feel free to ping us.

Regards, Marcel

lcolladotor commented 8 years ago

Hi Marcel @LiNk-NY,

Regarding ExperimentHub, we haven't considered it because we currently have about 6 TB of data and don't want to pay for storing it in S3.

Regarding eval on the vignette:

Data: examples and export

Regarding

For the data that you are using in the vignettes, they are not exported but they are used in the examples. Please export and add \example sections to the man pages.

Do you mean exporting data like recount_url and adding an example to it? https://github.com/leekgroup/recount/blob/master/R/recount_url-data.R Or only https://github.com/leekgroup/recount/blob/master/R/rse_gene_SRP009615-data.R? I thought that I didn't need to export the data since we are using lazy loading. Having examples makes sense for the data that is not used internally by the package, although if you want examples for all of them I can do it too.

Windows

The functions expressed_regions() and coverage_matrix() use rtracklayer::import.bw() to import BigWig files. This is not supported on Windows, which is why some tests, examples and code chunks in the vignette are not evaluated on Windows.

All functions include runnable examples. The \dontrun tags are used for code that users might be interested in but would take longer to run.

Imports

I suspected the NAMESPACE/DESCRIPTION issue about Imports might pop up. We anticipate that not all users will use the full functionality of the package, which is why we use requireNamespace() (see https://github.com/leekgroup/recount/blob/master/R/utils.R) for packages that you would typically import.

For example, we think a user might only use browse_study() , abstract_search() and download_study() to get data. In which case they don't need anything else beyond the downloader package.

This idea is from Karthik Ram's review on another package I wrote, see it at http://f1000research.com/articles/4-105/v1.

If it's a must, I can certainly move these packages from Suggests to Imports. Note that as is, the package does pass R CMD check and the search path is not modified.

I'll wait for your reply before making any changes.

Best, Leonardo

LiNk-NY commented 8 years ago

Hi Leonardo, @lcolladotor The note about exporting and documenting data was hinging on the notion that data objects that are not used internally for the package should be exported and documented, especially if they are used for the vignette. It just makes it clear that the data is from the package and the user has easy access it. @vobencha may want to weigh more on this point.

The \dontrun tags are generally discouraged but if there are time constraints for running some examples, those can be left in the documentation examples.

It seems like installing and loading packages ad hoc would be okay but it might not be considered the standard for Bioconductor. I would defer to @mtmorgan or @vobencha for how to handle this method of clearing dependency hurdles.

Thank you for all the work that you have been doing to get the package up and running in the SPB. It is looking great.

Regards, Marcel

mtmorgan commented 8 years ago

I haven't looked at the package code in detail. The line that stood out for me from the review @lcolladotor mentions is "It was disappointing to go down a rabbit hole of dependencies and still not be able to install and run examples" -- i.e., the installed package should 'just work'.

You mention that most users will only use functions that have downloader dependencies, but downloader is in the Suggests: field so most users will end up installing this package indirectly. Why re-invent the package management wheel? downloader is used internally, and should be in the Imports: field.

For most of the other dependencies, they would be 'heavy' if your package were stand-alone, but you've submitted it to Bioconductor and most Bioconductor users will not see this weight. Also, the functionality described in the abstract implies that SummarizedExperiment is a typical return value, and this already implies availability of the Bioconductor core packages. Since your functions return objects from this package, and users presumably want to manipulate those objects, then SummarizedExperiment (and GenomicRanges) should be in Depends:.

If your package were meant to process data from a diversity of organisms, the TxDb packages might be in Suggests:, but here the results are strongly human-centric, so it seems appropriate for this to be in Imports (I suppose the package is being used internally; you're not returning TxDb objects to the user).

On the other hand, knitr, testthat, devtools and packages used during installation and testing seem appropriate in the Suggests: field.

Also, the Description says that the recount package can be used to explore and download data from the recount project, but this is only informative if one knows what the recount project is. Augment with some description of what the recount project is. https://lcolladotor.shinyapps.io/recount/ produces a 404 error. I think your package should include an installation test that this resource is actually available.

LiNk-NY commented 8 years ago

Thank you @mtmorgan for your valuable input. @lcolladotor, once you get those changes committed and the package built, we will be able to approve your package.

Note: You are getting some warnings. I am unable to tell where the warnings are originating from but it seems to be an undocumented exported function that is generating this warning message:

All user-level objects in a package should have documentation entries.
See chapter 'Writing R documentation files' in the 'Writing R
Extensions' manual.

Best, Marcel

lcolladotor commented 8 years ago

Hi,

I've been traveling lately but will get back to you next week.

Best, Leonardo

LiNk-NY commented 8 years ago

Safe travels!

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

1e3ee48 Fix stats link 8938c2d Fix stats link (real commit) 729746a Test that https://jhubiostatistics.shinyapps.io/re... 71e2393 Add link to shinyapps thread about testing that th... a90947a v0.99.22 -- Switched to using imports in NAMESPACE...

lcolladotor commented 8 years ago

Hi @LiNk-NY,

I just pushed recount version 0.99.22. Here's a quick overview of the changes.

Test shinyapps

https://github.com/leekgroup/recount/commit/729746a8576bde1c4327dd5818c217f123f211ed adds a test for https://jhubiostatistics.shinyapps.io/recount/ (I had changed the link in https://github.com/leekgroup/recount/commit/3e9d329c5f83bafa847953317ebe9c7772778df7). However, I can't run this test all the time as noted in https://github.com/leekgroup/recount/commit/71e2393e5c2c1c455f9c271f6ca61ad61eaead71.

Namespace

I changed most of the namespace in https://github.com/leekgroup/recount/commit/a90947a9b40db179fff9b836529fae185fa6964c. I'm importing most of the packages used in the R code. However as noted in https://github.com/leekgroup/recount/commit/a90947a9b40db179fff9b836529fae185fa6964c I'm keeping TxDb.Hsapiens.UCSC.hg38.knownGene and org.Hs.eg.db as suggests because they are only used in recount::reproduce_ranges() which is there for reproducibility purposes. As Martin noted, txdb objects are not returned to the user. They were used to create the RangedSummarizedExperiment objects that you can download with recount. I did add SummarizedExperiment to the depends field thanks to Martin's feedback.

Thanks to Michael Lawrence's recent changes in rtracklayer (https://github.com/Bioconductor-mirror/rtracklayer/commit/7eaaa2dd18a0f9a1694ae7df8146fd6cb83a4625, https://github.com/Bioconductor-mirror/rtracklayer/commit/136a2bd90859bae38084d263edb75dca25e8c43f), the package testing works (see https://stat.ethz.ch/pipermail/bioc-devel/2016-July/009513.html for more info). Guessing that version 1.33.10 is not immediately available on the build system, I added the output from R CMD check and BiocCheck at https://github.com/leekgroup/recount/commit/a90947a9b40db179fff9b836529fae185fa6964c.

Previous warning gone with new R

Regarding the warning you mentioned earlier:

All user-level objects in a package should have documentation entries.
See chapter 'Writing R documentation files' in the 'Writing R
Extensions' manual.

it's gone with the new version of R. That message you quoted is actually part of what I had mentioned at https://github.com/Bioconductor/Contributions/issues/9#issuecomment-226967217. Basically, Martin noted that it was an R issue as documented at https://stat.ethz.ch/pipermail/bioc-devel/2016-May/009230.html.

Let me know if you have any questions.

Best, Leonardo

LiNk-NY commented 8 years ago

Thanks for your submission Leonardo @lcolladotor. I will see to get your package built in the SPB although the Travis-CI build is looking good. https://travis-ci.org/leekgroup/recount

We will add your package to the bi-nightly build and you will be getting SVN credentials (if you don't have them already).

Best, Marcel

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/recount_buildreport_20160718144037.html

lcolladotor commented 8 years ago

Awesome Marcel @LiNk-NY, thanks!

And yes, I already have SVN credentials.

Best, Leonardo