Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

recountmethylation #1527

Closed metamaden closed 4 years ago

metamaden commented 4 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Hi @metamaden

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: recountmethylation
Version: 0.99.0
Title: Access and Analyze DNA Methylation Array Databases
Authors@R: c(person(c("Sean", "K"), "Maden", role = c("cre", "aut"),
        email = "maden@ohsu.edu", comment = c(ORCID = "0000-0002-2212-4894")),
    person("Reid F", "Thompson", role = "aut", comment = c(ORCID = "0000-0003-3661-5296")),
    person("Kasper D", "Hansen", role = "aut", email = "hansen@jhsph.edu", comment = c(ORCID = "0000-0002-3303-4819")),
    person("Abhinav", "Nellore", role = "aut", email = "nellore@ohsu.edu", comment = c(ORCID = "0000-0001-8145-1484")))
Description: Access cross-study compilations of DNA methylation array databases. Database files can be downloaded and accessed using provided
    functions. Background about database file types (HDF5 and HDF5-SummarizedExperiment), 
    SummarizedExperiment classes, and examples for data handling, validation, and analyses, 
    can be found in the package vignettes. Note the disclaimer on package load, and consult 
    the main manuscript for further info.
License: Artistic-2.0
Encoding: UTF-8
URL: https://github.com/metamaden/recountmethylation
BugReports: https://github.com/metamaden/recountmethylation/issues
LazyData: FALSE
Depends: R (>= 4.0.0)
Imports:
    minfi,
    HDF5Array,
    rhdf5,
    S4Vectors,
    utils,
    methods,
    RCurl
Suggests:
    knitr,
    testthat,
    ggplot2,
    gridExtra,
    rmarkdown,
    BiocStyle
VignetteBuilder: 
    knitr
biocViews: DNAMethylation, Epigenetics, Microarray, MethylationArray
RoxygenNote: 7.1.0
bioc-issue-bot commented 4 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

85a1c56 rm .rproj, bump version

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

cd491b2 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

metamaden commented 4 years ago

Hi @lshep, According to the latest build report, it looks like there may be an issue with the installed MiKTeX version in the tokay1 Windows server (error message: pdfcrop.exe: MiKTeX encountered an internal error). I'm not sure if it's possible for me to fix by modifying the package vignettes, whose re-build seems to be where this error occurs. Thanks in advance for your help!

lshep commented 4 years ago

I'll investigate the MiKTeX on our end. In the meantime here is the initial review; I apologize for the delay.

README

INST

TESTS

VIGNETTE

Removing downloaded files...
Warning messages:
1: In readChar(con, nchars = n) : truncating string with embedded nuls
2: In readChar(con, nchars = n) : truncating string with embedded nuls
3: In readChar(con, nchars = n) : truncating string with embedded nuls
4: In readChar(con, nchars = n) : truncating string with embedded nuls
5: In file.remove(f) :
  cannot remove file './idats/GSM1038308_5958154021_R01C01_Grn.idat.gz', reason 'No such file or directory'
6: In file.remove(f) :
  cannot remove file './idats/GSM1038308_5958154021_R01C01_Red.idat.gz', reason 'No such file or directory'
7: In file.remove(f) :
  cannot remove file './idats/GSM1038309_5958154021_R02C01_Grn.idat.gz', reason 'No such file or directory'
8: In file.remove(f) :
  cannot remove file './idats/GSM1038309_5958154021_R02C01_Red.idat.gz', reason 'No such file or directory'

As this package was submitted before July 7, Please continue to make changes and push version bumps through the webhook onto your github package and ignore the message in future postings of build reports concerning pushing to git.bioconductor.org
When ready please trigger a new build and comment back here addressing the above points. Cheers

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

3c8c2ea Update DESCRIPTION version bump

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

db7ed38 rm untracked files rm .Rproj, .log files 76cc39d Merge branch 'master' of https://github.com/metama... 4de451d Update DESCRIPTION version bump

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

metamaden commented 4 years ago

Thanks @lshep! I have pushed the requested updates and bumped the package version. Please find my replies below.

I'll investigate the MiKTeX on our end. In the meantime here is the initial review; I apologize for the delay.

Great, thank you!

README

Please also include Bioconductor installation instructions

I have added a corresponding code block to the README.

INST

Was this meant to have the data added to the Biocondcutor ExperimentHub? In which case this never happened?

I'll answer this to the best of my knowledge -- I'm new to Bioc package types and submissions, so please correct me if I'm misunderstanding things. I added a datasets metadata file and script with this package, per ExperimentHub package requirements, as this seems like good practice for reproducible research. I haven't submitted this package for inclusion on ExperimentHub because:

  1. I believe the database files we're hosting are too large to be hosted on ExperimentHub (e.g. smallest file is about 90 Gb, largest is about 125 Gb), and
  2. We wanted to model recountmethylation after existing software packages like GEOmetadb, which provide convenience functions for an externally-hosted database file not on ExperimentHub.

I noticed when loading the package you include a citation reference. Maybe it would also be beneficial to include this as a CITATION file so that it will be included on the package landing page.

I added a CITATION file, and verified citation("recountmethylation") works.

TESTS

It looks like all your tests are disabled. Why?

I have expanded and added to unit tests, and verified these run successfully with R CMD CHECK.

VIGNETTE

I might recommended adding the package name to the vignettes. users_guide especially is very generic and depending on a users loaded packages could potentially lead to name conflicts.

I added recountmethylation to the vignette filenames.

uses_guide

Why not show where you download the data from the url?

I added the server URL as a hyperlink.

when running through the vignette section 3.1, h5se.test <- getdb_h5se_test() downloads data into a directory downloads written to the users working directory. This should write to a temporary directory for the vignette, and emphasize the argument to control where the data is downloaded. downloads is a very generic folder name that has the potential to overwrite a users own data if they are unaware. This should be applied throughout the vignette as it looks like there might be other similar calls in other sections that create directory idats

Thanks for pointing this out. I updated various vignette and example file downloads so they now utilize temp dir paths with tempdir().

When I get to kable(head(h5se.bm), align = "c") in section 3.2, I get an error could not find function "kable". Either change to knitr::kable or show the library(knitr) call at the beginning of the document

I replaced kable() with knitr::kable() throughout the vignettes.

When running > geo.rg <- gds_idat2rg(gsmv) interactively in section 5.1 I did get the following: Could it be cleaned up?

I added an argument silent = TRUE to gds_idat2rg(), which suppresses warnings from removing downloaded files. The function also suppresses warnings for read.metharray().

data_analyses

Why not show the loading of your provided data?

This chunk is now visible.

Can you elaborate on section 1.1 where you say you need to do limited evaluation for building package successfully? If the process fails that isn't that an issue that needs to be address? or was it purely time constraints?

The latter statement is correct; the script "data_analyses.R" should complete without errors, but it takes a long time to run (approx. 1-2 hours, without downloading the large database files). I've revised/expanded this section of the vignette so hopefully this is clearer.

As this package was submitted before July 7, Please continue to make changes and push version bumps through the webhook onto your github package and ignore the message in future postings of build reports concerning pushing to git.bioconductor.org When ready please trigger a new build and comment back here addressing the above points.

Version has been bumped.

Cheers

Thanks again!

Sean

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

9405c9b test, vers revises get_rmdl test bumps version

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lshep commented 4 years ago

Please fix the ERROR's in your package before the review process will move forward

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

d5d4956 Update DESCRIPTION

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

3257e77 remove broken test

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

5710b9b guide, h5 test file, bump update user's guide . c...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lshep commented 4 years ago

Please investigate the windows ERROR; perhaps some character or specific os path information isn't correct? I will review the package in the meantime and should have a review in the next few days.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

bcfb3d5 documentation, description, gds_idatquery(), tests...

metamaden commented 4 years ago

Thanks @lshep . It looks like there were 2 outstanding issues for windows:

  1. Malformed paths mixing separator symbols (e.g. file.path(tempdir(), "newdir") results in "C:\\...AppData\\Local\\Temp\\RtmpaSGzCw/newdir). I tried fixing this by simplifying paths using just "tempdir()", or replacing symbols with gsub.

  2. The function gds_idatquery calls gunzip from system, but the function isn't on Windows by default. I've replaced the system call with R.utils::gunzip and import the new dependency in description.

There are likely other unresolved issues, e.g. with building the vignettes. We'll see with the next build attempt.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

ab07577 https downloads, tests . Adds method =curl`` t...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lshep commented 4 years ago

Package is looking in really good shape now. I still have minor concerns regarding the larger download of data.

you are correct we would not necessarily want to host data of that size. If you would like the data to be listed in ExperimentHub but hosted on your own public server that is also feasible. The requirement to host outside the ExperimentHub is that it must be publicly accessible which is seems like it is. The metadata.csv file would have to be changed slightly for this to occur and if you would like that option I can offer advice and assist with this. If you would like to leave as is and have it strictly as reference mimicing the ExpeirmentHub requirements that is also fine but then we would want to make sure when the data at https://recount.bio/data/ is downloaded that it is being cached. Is that currently the case? From the looks of the download_db.R it doesn't appear so. Caching allows users to use a previously downloaded version of a file rather than redownloading. If you would like to take care of caching on your own we would highly recommend BiocFileCache which is the caching backend of ExperimentHub.

Cheers

metamaden commented 4 years ago

Hi @lshep. That's correct; thegetdb functions don't currently using caching, and I agree caching with BiocFileCache should improve their usability. I'm beginning to look into this, and will push the updates when ready.

Re data hosting, the data is completely open access/public from the GEO server, and the package helps access these data in more readily usable formats. It would be nice if the external link appeared on ExperimentHub, and I'm open to revising metadata.csv accordingly.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

3852f7e caching Adds support for BiocFileCache caching Ad... 536cf7d Update download_db.R name to namematch, all getdb... 94cbfb9 rmdl Adds gsub to fix path returned on windows 7ffacc1 updates doc 9da22d6 doc, fixes user prompt tidies user prompt, getdb ... 6c11014 doc, dldb h5 dldb functions for h5 objects return... 3d08150 get_rmdl for h5se files, fixes second dfp try con... 4aa257a bump version bump 58bb78b typo, rproj fixes typo rm .Rproj file 4bd10e9 Update recountmethylation_users_guide.Rmd move da... f962163 Merge pull request #17 from metamaden/caching Cac...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

metamaden commented 4 years ago

I've added BiocFileCache support for the getdb functions with a user prompt. If a preexisting database file is detected, the user is asked whether to use this; entering "Y" should return the file path, while "N" should prompt a fresh download.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

808f594 file overwrite check, bump adds overwrite check t... 12fe67d Merge pull request #18 from metamaden/caching fil...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

metamaden commented 4 years ago

I've added a file overwrite safety check. If an identical local file is detected at the download destination, the user is prompted whether to overwrite the existing file with the new download. If the caching and user prompts pass muster, I think the metadata.csv file modifications are next up.

metamaden commented 4 years ago

Hi @lshep ! Just wanted to check in about final revisions for this package. If caching and download management are now ok, I'll be happy to change the metadata.csv file so that the database link can appear on ExperimentHub. Thanks in advance for your guidance!

lshep commented 4 years ago

I will have a look this week and get back to you. Sorry for the delay

lshep commented 4 years ago

If you used ExperimentHub and utilized the ExperimentHub front end you didn't have to implement your own caching since the ExperimentHub does caching with BiocFileCache in the back-end already. But you can keep your own version if you like.

If you would like the data to be included and accessible via ExperimentHub, Please change the following in the metadata.csv file:

  1. BiocVersion should be 3.12
  2. Add a field Location_Prefix to indicate your server https://recount.bio/data/
  3. Update the RDataPath to be the resource name only (The Location_Prefix and RDataPath fields are concatenated internatlly
  4. Update the DispathchClass field. The DispatchClass must be from AnnotationHub::DispatchClassList(). I might suggest FilePath which will download the file to the local cache but not try to load the file/object in R. The result is the path to the file.
  5. Change capitalization of SourceURL to SourceUrl

And add the biocViews term ExperimentHub to the DESCRIPTION file. I will mention this will cause an ERROR or WARNING in BiocCheck that can be ignored. Normally we separate out the software package and the data or data access package so biocViews are from one field. In this case it is a software package that will also have a data biocView for ExperimentHub.

Its been awhile since someone hosted their own data so once the metadata.csv file is updated with the above, let me know and I can try to add it to the hubs.

Cheers,

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

181cff4 EH metadata updates metadata for ExperimentHub . ... f0fb53f Merge pull request #19 from metamaden/caching EH ...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/recountmethylation to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

metamaden commented 4 years ago

Thanks @lshep ! I've gone ahead and pushed the changes you mentioned, with the new metadata.csv. I also fixed a mislabeled file name and changed the SourceType to "HDF5". Sure enough, it looks like there's a new BiocCheck warning from the addition of ExperimentHub to biocViews.

lshep commented 4 years ago

Okay should be available through experimenthub now. You might want to add some documentation on how to use the files if accessed through experimenthub.

> eh = ExperimentHub()
 snapshotDate(): 2020-09-09
> query(eh, "recountmethylation")
ExperimentHub with 6 records
# snapshotDate(): 2020-09-09
# $dataprovider: GEO/GDS
# $species: Homo sapiens
# $rdataclass: HDF5-SummarizedExperiment, HDF5Database
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["EH3773"]]' 

           title                                  
  EH3773 | remethdb-h5se_gr_0-0-1_1590090412      
  EH3774 | remethdb-h5se_gm_0-0-1_1590090412      
  EH3775 | remethdb_h5se-rg_0-0-1_1590090412      
  EH3776 | remethdb-h5se_gr-test_0-0-1_159009041  
  EH3777 | remethdb-h5_rg_0-0-1_1590090412.h5     
  EH3778 | remethdb-h5_rg-test_0-0-1_1590090412.h5
> temp = query(eh, "recountmethylation")[[1]]
> temp
                                                 EH3773 
"/home/shepherd/.cache/ExperimentHub/2d0413d71d5a_3809" 

You can ignore the BiocCheck warning. thats expected and in this case understandable.

lshep commented 4 years ago

The package looks good. If you want to keep your own caching mechanism instead of using ExpeirmentHub as the front end than I'm okay with accepting as is. If you want to update to take advantage of ExperimentHub I would suggest keeping in the tracker for testing. Please let me know. As mentioned I would add a documentation about the availability and how to use the files if accessed through the hub. Cheers

metamaden commented 4 years ago

Hi @lshep . This sounds great! I think the current caching mechanism is ok for now -- would it be possible to update to use ExperimentHub caching down the line if it becomes clear that's needed? I'm happy to add documentation about how to use the files if accessed through the hub, but I wasn't sure where to find directions about this or what that should contain. Are there directions online somewhere that I've overlooked?

lshep commented 4 years ago

Yes that is fine. It seems like files might be relates so some instruction about using the ExperimentHub front end

hub = ExperimentHub()
query(hub, "recountmethylation")

file = hub[["EH3773"]]
#if certain files interact together or are used to create a larger object you could show which to download and how to construct. 
lshep commented 4 years ago

Please add the documentation. We recommend sooner than later. I don't consider this a holding point for accepting the package. Cheers.

bioc-issue-bot commented 4 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!