Closed metamaden closed 4 years ago
Hi @metamaden
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: recountmethylation
Version: 0.99.0
Title: Access and Analyze DNA Methylation Array Databases
Authors@R: c(person(c("Sean", "K"), "Maden", role = c("cre", "aut"),
email = "maden@ohsu.edu", comment = c(ORCID = "0000-0002-2212-4894")),
person("Reid F", "Thompson", role = "aut", comment = c(ORCID = "0000-0003-3661-5296")),
person("Kasper D", "Hansen", role = "aut", email = "hansen@jhsph.edu", comment = c(ORCID = "0000-0002-3303-4819")),
person("Abhinav", "Nellore", role = "aut", email = "nellore@ohsu.edu", comment = c(ORCID = "0000-0001-8145-1484")))
Description: Access cross-study compilations of DNA methylation array databases. Database files can be downloaded and accessed using provided
functions. Background about database file types (HDF5 and HDF5-SummarizedExperiment),
SummarizedExperiment classes, and examples for data handling, validation, and analyses,
can be found in the package vignettes. Note the disclaimer on package load, and consult
the main manuscript for further info.
License: Artistic-2.0
Encoding: UTF-8
URL: https://github.com/metamaden/recountmethylation
BugReports: https://github.com/metamaden/recountmethylation/issues
LazyData: FALSE
Depends: R (>= 4.0.0)
Imports:
minfi,
HDF5Array,
rhdf5,
S4Vectors,
utils,
methods,
RCurl
Suggests:
knitr,
testthat,
ggplot2,
gridExtra,
rmarkdown,
BiocStyle
VignetteBuilder:
knitr
biocViews: DNAMethylation, Epigenetics, Microarray, MethylationArray
RoxygenNote: 7.1.0
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
85a1c56 rm .rproj, bump version
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
cd491b2 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi @lshep, According to the latest build report, it looks like there may be an issue with the installed MiKTeX version in the tokay1 Windows server (error message: pdfcrop.exe: MiKTeX encountered an internal error
). I'm not sure if it's possible for me to fix by modifying the package vignettes, whose re-build seems to be where this error occurs. Thanks in advance for your help!
I'll investigate the MiKTeX on our end. In the meantime here is the initial review; I apologize for the delay.
README
INST
[ ] Was this meant to have the data added to the Biocondcutor ExperimentHub? In which case this never happened?
[ ] I noticed when loading the package you include a citation reference. Maybe it would also be beneficial to include this as a CITATION file so that it will be included on the package landing page.
TESTS
VIGNETTE
[ ] I might recommended adding the package name to the vignettes. users_guide especially is very generic and depending on a users loaded packages could potentially lead to name conflicts.
uses_guide
[ ] Why not show where you download the data from the url?
[ ] when running through the vignette section 3.1, h5se.test <- getdb_h5se_test()
downloads data into a directory downloads
written to the users working
directory. This should write to a temporary directory for the vignette, and
emphasize the argument to control where the data is downloaded. downloads is a
very generic folder name that has the potential to overwrite a users own
data if they are unaware. This should be applied throughout the vignette as it
looks like there might be other similar calls in other sections that create
directory idats
[ ] When I get to kable(head(h5se.bm), align = "c")
in section 3.2, I get an
error could not find function "kable". Either change to knitr::kable or show
the library(knitr)
call at the beginning of the document
[ ] When running > geo.rg <- gds_idat2rg(gsmv)
interactively in section 5.1
I did get the following: Could it be cleaned up?
Removing downloaded files...
Warning messages:
1: In readChar(con, nchars = n) : truncating string with embedded nuls
2: In readChar(con, nchars = n) : truncating string with embedded nuls
3: In readChar(con, nchars = n) : truncating string with embedded nuls
4: In readChar(con, nchars = n) : truncating string with embedded nuls
5: In file.remove(f) :
cannot remove file './idats/GSM1038308_5958154021_R01C01_Grn.idat.gz', reason 'No such file or directory'
6: In file.remove(f) :
cannot remove file './idats/GSM1038308_5958154021_R01C01_Red.idat.gz', reason 'No such file or directory'
7: In file.remove(f) :
cannot remove file './idats/GSM1038309_5958154021_R02C01_Grn.idat.gz', reason 'No such file or directory'
8: In file.remove(f) :
cannot remove file './idats/GSM1038309_5958154021_R02C01_Red.idat.gz', reason 'No such file or directory'
[ ] When running interactively your cleanup directories is hidden so again the
user could potentially lose their own data if they had already had a downloads
or idats directory in their working directory. This is dangerous and why we
recommend using a temporary directory for vignette and examples (showing the
argument to control where the data goes by using the tempdir()
)
data_analyses
[ ] Why not show the loading of your provided data?
[ ] Can you elaborate on section 1.1 where you say you need to do limited evaluation for building package successfully? If the process fails that isn't that an issue that needs to be address? or was it purely time constraints?
As this package was submitted before July 7, Please continue to make changes and push version bumps through the webhook onto your github package and ignore the message in future postings of build reports concerning pushing to git.bioconductor.org
When ready please trigger a new build and comment back here addressing the above points.
Cheers
Received a valid push; starting a build. Commits are:
3c8c2ea Update DESCRIPTION version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push; starting a build. Commits are:
db7ed38 rm untracked files rm .Rproj, .log files 76cc39d Merge branch 'master' of https://github.com/metama... 4de451d Update DESCRIPTION version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Thanks @lshep! I have pushed the requested updates and bumped the package version. Please find my replies below.
I'll investigate the MiKTeX on our end. In the meantime here is the initial review; I apologize for the delay.
Great, thank you!
README
Please also include Bioconductor installation instructions
I have added a corresponding code block to the README.
INST
Was this meant to have the data added to the Biocondcutor ExperimentHub? In which case this never happened?
I'll answer this to the best of my knowledge -- I'm new to Bioc package types and submissions, so please correct me if I'm misunderstanding things. I added a datasets metadata file and script with this package, per ExperimentHub package requirements, as this seems like good practice for reproducible research. I haven't submitted this package for inclusion on ExperimentHub because:
recountmethylation
after existing software packages like GEOmetadb
, which provide convenience functions for an externally-hosted database file not on ExperimentHub.I noticed when loading the package you include a citation reference. Maybe it would also be beneficial to include this as a CITATION file so that it will be included on the package landing page.
I added a CITATION file, and verified citation("recountmethylation")
works.
TESTS
It looks like all your tests are disabled. Why?
I have expanded and added to unit tests, and verified these run successfully with R CMD CHECK
.
VIGNETTE
I might recommended adding the package name to the vignettes. users_guide especially is very generic and depending on a users loaded packages could potentially lead to name conflicts.
I added recountmethylation
to the vignette filenames.
uses_guide
Why not show where you download the data from the url?
I added the server URL as a hyperlink.
when running through the vignette section 3.1, h5se.test <- getdb_h5se_test() downloads data into a directory downloads written to the users working directory. This should write to a temporary directory for the vignette, and emphasize the argument to control where the data is downloaded. downloads is a very generic folder name that has the potential to overwrite a users own data if they are unaware. This should be applied throughout the vignette as it looks like there might be other similar calls in other sections that create directory idats
Thanks for pointing this out. I updated various vignette and example file downloads so they now utilize temp dir paths with tempdir()
.
When I get to kable(head(h5se.bm), align = "c") in section 3.2, I get an error could not find function "kable". Either change to knitr::kable or show the library(knitr) call at the beginning of the document
I replaced kable()
with knitr::kable()
throughout the vignettes.
When running > geo.rg <- gds_idat2rg(gsmv) interactively in section 5.1 I did get the following: Could it be cleaned up?
I added an argument silent = TRUE
to gds_idat2rg()
, which suppresses warnings from removing downloaded files. The function also suppresses warnings for read.metharray()
.
data_analyses
Why not show the loading of your provided data?
This chunk is now visible.
Can you elaborate on section 1.1 where you say you need to do limited evaluation for building package successfully? If the process fails that isn't that an issue that needs to be address? or was it purely time constraints?
The latter statement is correct; the script "data_analyses.R" should complete without errors, but it takes a long time to run (approx. 1-2 hours, without downloading the large database files). I've revised/expanded this section of the vignette so hopefully this is clearer.
As this package was submitted before July 7, Please continue to make changes and push version bumps through the webhook onto your github package and ignore the message in future postings of build reports concerning pushing to git.bioconductor.org When ready please trigger a new build and comment back here addressing the above points.
Version has been bumped.
Cheers
Thanks again!
Sean
Received a valid push; starting a build. Commits are:
9405c9b test, vers revises get_rmdl test bumps version
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Please fix the ERROR's in your package before the review process will move forward
Received a valid push; starting a build. Commits are:
d5d4956 Update DESCRIPTION
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push; starting a build. Commits are:
3257e77 remove broken test
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push; starting a build. Commits are:
5710b9b guide, h5 test file, bump update user's guide . c...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Please investigate the windows ERROR; perhaps some character or specific os path information isn't correct? I will review the package in the meantime and should have a review in the next few days.
Received a valid push; starting a build. Commits are:
bcfb3d5 documentation, description, gds_idatquery(), tests...
Thanks @lshep . It looks like there were 2 outstanding issues for windows:
Malformed paths mixing separator symbols (e.g. file.path(tempdir(), "newdir")
results in "C:\\...AppData\\Local\\Temp\\RtmpaSGzCw/newdir
). I tried fixing this by simplifying paths using just "tempdir()", or replacing symbols with gsub.
The function gds_idatquery
calls gunzip
from system
, but the function isn't on Windows by default. I've replaced the system call with R.utils::gunzip
and import the new dependency in description.
There are likely other unresolved issues, e.g. with building the vignettes. We'll see with the next build attempt.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push; starting a build. Commits are:
ab07577 https downloads, tests . Adds method =
curl`` t...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Package is looking in really good shape now. I still have minor concerns regarding the larger download of data.
you are correct we would not necessarily want to host data of that size. If you would like the data to be listed in ExperimentHub but hosted on your own public server that is also feasible. The requirement to host outside the ExperimentHub is that it must be publicly accessible which is seems like it is. The metadata.csv file would have to be changed slightly for this to occur and if you would like that option I can offer advice and assist with this.
If you would like to leave as is and have it strictly as reference mimicing the ExpeirmentHub requirements that is also fine but then we would want to make sure when the data at https://recount.bio/data/
is downloaded that it is being cached. Is that
currently the case? From the looks of the download_db.R it doesn't appear so. Caching allows users to use a previously downloaded version of a file rather than redownloading. If you would like to take care of caching on your own we would highly recommend BiocFileCache
which is the caching backend of ExperimentHub.
Cheers
Hi @lshep. That's correct; thegetdb
functions don't currently using caching, and I agree caching with BiocFileCache
should improve their usability. I'm beginning to look into this, and will push the updates when ready.
Re data hosting, the data is completely open access/public from the GEO server, and the package helps access these data in more readily usable formats. It would be nice if the external link appeared on ExperimentHub, and I'm open to revising metadata.csv accordingly.
Received a valid push; starting a build. Commits are:
3852f7e caching Adds support for BiocFileCache caching Ad... 536cf7d Update download_db.R name to namematch, all getdb... 94cbfb9 rmdl Adds gsub to fix path returned on windows 7ffacc1 updates doc 9da22d6 doc, fixes user prompt tidies user prompt, getdb ... 6c11014 doc, dldb h5 dldb functions for h5 objects return... 3d08150 get_rmdl for h5se files, fixes second dfp try con... 4aa257a bump version bump 58bb78b typo, rproj fixes typo rm .Rproj file 4bd10e9 Update recountmethylation_users_guide.Rmd move da... f962163 Merge pull request #17 from metamaden/caching Cac...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
I've added BiocFileCache
support for the getdb
functions with a user prompt. If a preexisting database file is detected, the user is asked whether to use this; entering "Y" should return the file path, while "N" should prompt a fresh download.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
I've added a file overwrite safety check. If an identical local file is detected at the download destination, the user is prompted whether to overwrite the existing file with the new download. If the caching and user prompts pass muster, I think the metadata.csv file modifications are next up.
Hi @lshep ! Just wanted to check in about final revisions for this package. If caching and download management are now ok, I'll be happy to change the metadata.csv file so that the database link can appear on ExperimentHub. Thanks in advance for your guidance!
I will have a look this week and get back to you. Sorry for the delay
If you used ExperimentHub and utilized the ExperimentHub front end you didn't have to implement your own caching since the ExperimentHub does caching with BiocFileCache in the back-end already. But you can keep your own version if you like.
If you would like the data to be included and accessible via ExperimentHub, Please change the following in the metadata.csv file:
Location_Prefix
to indicate your server https://recount.bio/data/
AnnotationHub::DispatchClassList()
. I might suggest FilePath
which will download the file to the local cache but not try to load the file/object in R. The result is the path to the file. And add the biocViews term ExperimentHub to the DESCRIPTION file. I will mention this will cause an ERROR or WARNING in BiocCheck that can be ignored. Normally we separate out the software package and the data or data access package so biocViews are from one field. In this case it is a software package that will also have a data biocView for ExperimentHub.
Its been awhile since someone hosted their own data so once the metadata.csv file is updated with the above, let me know and I can try to add it to the hubs.
Cheers,
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/recountmethylation
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Thanks @lshep ! I've gone ahead and pushed the changes you mentioned, with the new metadata.csv. I also fixed a mislabeled file name and changed the SourceType to "HDF5". Sure enough, it looks like there's a new BiocCheck warning from the addition of ExperimentHub to biocViews.
Okay should be available through experimenthub now. You might want to add some documentation on how to use the files if accessed through experimenthub.
> eh = ExperimentHub()
snapshotDate(): 2020-09-09
> query(eh, "recountmethylation")
ExperimentHub with 6 records
# snapshotDate(): 2020-09-09
# $dataprovider: GEO/GDS
# $species: Homo sapiens
# $rdataclass: HDF5-SummarizedExperiment, HDF5Database
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH3773"]]'
title
EH3773 | remethdb-h5se_gr_0-0-1_1590090412
EH3774 | remethdb-h5se_gm_0-0-1_1590090412
EH3775 | remethdb_h5se-rg_0-0-1_1590090412
EH3776 | remethdb-h5se_gr-test_0-0-1_159009041
EH3777 | remethdb-h5_rg_0-0-1_1590090412.h5
EH3778 | remethdb-h5_rg-test_0-0-1_1590090412.h5
> temp = query(eh, "recountmethylation")[[1]]
> temp
EH3773
"/home/shepherd/.cache/ExperimentHub/2d0413d71d5a_3809"
You can ignore the BiocCheck warning. thats expected and in this case understandable.
The package looks good. If you want to keep your own caching mechanism instead of using ExpeirmentHub as the front end than I'm okay with accepting as is. If you want to update to take advantage of ExperimentHub I would suggest keeping in the tracker for testing. Please let me know. As mentioned I would add a documentation about the availability and how to use the files if accessed through the hub. Cheers
Hi @lshep . This sounds great! I think the current caching mechanism is ok for now -- would it be possible to update to use ExperimentHub caching down the line if it becomes clear that's needed? I'm happy to add documentation about how to use the files if accessed through the hub, but I wasn't sure where to find directions about this or what that should contain. Are there directions online somewhere that I've overlooked?
Yes that is fine. It seems like files might be relates so some instruction about using the ExperimentHub front end
hub = ExperimentHub()
query(hub, "recountmethylation")
file = hub[["EH3773"]]
#if certain files interact together or are used to create a larger object you could show which to download and how to construct.
Please add the documentation. We recommend sooner than later. I don't consider this a holding point for accepting the package. Cheers.
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.