Closed jonathangriffiths closed 5 years ago
Hi @jonathangriffiths
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: MouseGastrulationData
Title: Single-Cell Transcriptomics Data across Mouse Gastrulation and Early Organogenesis
Version: 0.99.0
Date: 2019-06-17
Authors@R: c(
person("Jonathan", "Griffiths", email = "jonathan.griffiths.94@gmail.com", role = c("aut", "cre")),
person("Aaron", "Lun", email = "infinite.monkeys.with.keyboards@gmail.com", role = "aut"))
Description:
Provides processed and raw count matrices for single-cell RNA sequencing data
from a timecourse of mouse gastrulation and early organogenesis.
Depends:
SingleCellExperiment
Imports:
methods,
ExperimentHub,
BiocGenerics,
S4Vectors
Suggests:
BiocStyle,
knitr,
rmarkdown
VignetteBuilder:
knitr
License: GPL-3
NeedsCompilation: no
Encoding: UTF-8
biocViews: ExperimentData, ExpressionData, SequencingData, RNASeqData, SingleCellData
RoxygenNote: 6.1.1
Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.
A note about the unchecked box: most functions in the package use dontrun{}
, and the package therefore fails BiocCheck
.
However, this is because the associated files are not yet available on ExperimentHub, so the examples cannot be run. The files that the package uses are uploaded, however, using information provided by @lshep.
Once the files are available we will update the package to remove these.
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
I see the data on S3. I unfortunately cannot add the data into the database for the experimenthub until the repository has the inst/extdata/metadata.csv files created. Could you please run the scripts in inst/scripts to create the three different metadata files. Once they are checked into the repository please tag me in a comment here.
I have just pushed that now @lshep, thanks!
Please make sure to add ExperimentHub to the biocViews list in your DESCRIPTION. The data has been added to the hub and you should now be able to test and debug with your data.
library(ExperimentHub)
hub = ExperimentHub()
query(hub, "MouseGastrulationData")
ExperimentHub with 65 records
# snapshotDate(): 2019-06-18
# $dataprovider: Jonathan Griffiths
# $species: Mus musculus
# $rdataclass: character
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH2583"]]'
title
EH2583 | Atlas processed counts
EH2584 | Atlas rowData
EH2585 | Atlas colData
EH2586 | Atlas size factors
EH2587 | Atlas reduced dimensions
... ...
EH2643 | WT chimera raw counts (sample 6)
EH2644 | WT chimera raw counts (sample 7)
EH2645 | WT chimera raw counts (sample 8)
EH2646 | WT chimera raw counts (sample 9)
EH2647 | WT chimera raw counts (sample 10)
Let me know if there is any other issues with accessing the data from the hub.
Received a valid push; starting a build. Commits are:
0806d0a Version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
0395532 Added subsampling for processed data; fixed vignet...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
6e98607 Push to trigger build
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
1b03561 Version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Okay, I believe this package is now in a ready-to-submit state.
The vignette build fails on the Windows machine due to the restricted amount of memory. Given that this is a package designed to facilitate the download and in-memory use of a large dataset, this does not seem to me to be a severe issue.
I have considered separating the largest objects into smaller files (split according to some sensible scheme like the 10x sample) - these could then be separately downloaded and loaded into R to reduce the memory burden. However, for the moment, I imagine it is better to get this package up and available for use rather than delay to edit the raw data files.
There is also a warning in R CMD CHECK
asking for an R >= 2.1.0 dependency. Of course, this is implicitly satisfied if the user is able to use a version of Bioconductor version that could contain this package. Do you know what the best thing to do is regarding this?
Thanks!
Hi @jonathangriffiths ,
For your last question about R dependency, you can simply add Depends: R (>= 3.6.0)
in DESCRIPTION
file, Depends
field.
I'll take a look at your other building issues and comment back later.
Best, Qian
HI @jonathangriffiths ,
Please see below for the review. Fixing warnings and errors from the building report were suggested also. Let me know for any questions.
Cheers, Qian
Date
field is not actually needed.URL
field, to direct users to source code
repositories, e.g., the GitHub repository.BugReports:
field: It is encouraged to include
the relevant links to Github for reporting Issues.@param
documentation.EmbryoAtlasData.R
, use
atlas.data <- EmbryoAtlasData(subsample.frac = 0.1)
atlas.data <- EmbryoAtlasData(type="processed", subsample.frac = 0.1)
Examples and vignette are only used to show functionalities, no need to use/load all data.
@examples data(embryo_celltype_colours)
field in the documentation..
for non-exported utility
functions. e.g., use .getProcOrRaw
for the function name.Installation
, to show to users how to install
the package as if it has been included in Bioconductor using
BiocManager
. Include something like this:1. Download the package from Bioconductor.
{r getPackage, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("pkgName")
Or install the development version of the package from Github.
{r, eval = FALSE}
BiocManager::install(“githubUserName/pkgname”)
2. Load the package into R session.
{r Load, message=FALSE}
library(pkgName)
sessionInfo()
in the end.Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
87a7a71 Version bump for Bioc build
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
I have now addressed all of the review points, except for the memory limit one.
I think this is not addressable at the moment due to the wat the data is currently stored on ExperimentHub - raw count matrix, when loaded into R, is 7.4 GB in memory. Therefore, even before the downsampling can happen, the memory threshold should be hit. There is also the point that Aaron has raised above, but I'm uncertain if that is playing a role in the problem here.
The "real" solution is probably to reorganise the files on ExperimentHub into smaller chunks (e.g., by developmental stage) so that we can reduce the amount of data that needs to be loaded at all. Indeed, I do plan to do this at some point, but I think it's probably better to get the package up and running for the much larger community that use high-memory 64-bit machines before doing this restructuring.
I made a tweak this morning to move a small data file from a data/.rda file to an R/*.R script, to allow easier reading from the code.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi @jonathangriffiths ,
After inspecting the error a bit further, we see that the data are pretty big in ExperimentHub, and the people who uses win 32 could not download the data successfully and then construct the SingleCellExperiment
object. Also for documentation examples / vignette examples, we don't expect author to have a very large data set for showcase.
So I talked with Martin @mtmorgan, and he suggests to add another function like EmbryoAtlasDataExample()
, which does the same in constructing the SCE, but the data only includes e.g., 2 samples, for showcase only. So in any operating system, this functionality of the real function EmbryoAtlasData()
could be successfully showed, and the @examples
doesn't occupy a lot of time given the building time limit.
Things to do include:
EmbryoAtlasDataExample()
, and do the EH downloading (small dataset), and SCE constructing, etc.
@examples
field under EmbryoAtlasData()
. EmbryoAtlasData
(or in the @examples
field, add some comments about the data size, and remind users to be cautious in calling this big function if only for testing purpose. Let me know for any question.
Best, Qian
Hi @Liubuntu,
If we're going to add some new data to ExperimentHub, I think it's probably best that I push ahead with a larger restructuring that allows the access of smaller subsets of data in a way that is biologically meaningful. This should be helpful not only for the vignette but also for researchers who want to access only a slice of the data. I'll have a go at this over the weekend.
Thanks, Jonny
Hi @lshep,
I've generated the new files for this package now - should I upload them using the same details as for the first file submission?
I think you can safely delete all of the files that had already been uploaded, too.
Thanks for the help, Jonny.
Sorry for the delayed response; I was away at a conference with limited internet connectivity. Yes please upload as before. I will need the updated metadata file to add to the database.
Just to double verify - all the original data uploaded can be deleted from S3 and out of the ExperimentHub database?
Cheers,
Lori Shepherd
Bioconductor Core Team
Roswell Park Cancer Institute
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
From: Jonathan Griffiths notifications@github.com Sent: Sunday, July 21, 2019 4:57:41 PM To: Bioconductor/Contributions Contributions@noreply.github.com Cc: Shepherd, Lori Lori.Shepherd@RoswellPark.org; Mention mention@noreply.github.com Subject: Re: [Bioconductor/Contributions] MouseGastrulationData (#1150)
Hi @lshephttps://github.com/lshep,
I've generated the new files for this package now - should I upload them using the same details as for the first file submission?
I think you can safely delete all of the files that had already been uploaded, too.
Thanks for the help, Jonny.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Bioconductor/Contributions/issues/1150?email_source=notifications&email_token=AEO3MHHF43RFU63VYVOHCGTQATEULA5CNFSM4HYYXS5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2OLOTI#issuecomment-513587021, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEO3MHHLGJ4YITUXTDG5AKLQATEULANCNFSM4HYYXS5A.
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
No problem!
Yes, everything else can be deleted. I will push the metadata csvs and upload the data files to AWS tomorrow.
Jonny.
Hi @lshep,
The files should all be uploaded now.
The metadata files are also updated at https://github.com/MarioniLab/MouseGastrulationData/tree/master/inst/extdata
Thanks for helping me sort this out,
Jonny
The new data has been added to the hub. Please update your package as necessary
> eh = ExperimentHub()
|======================================================================| 100%
snapshotDate(): 2019-08-01
> query(eh, "MouseGas")
ExperimentHub with 324 records
# snapshotDate(): 2019-08-01
# $dataprovider: Jonathan Griffiths
# $species: Mus musculus
# $rdataclass: character
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH2701"]]'
title
EH2701 | Atlas processed counts (sample 1)
EH2702 | Atlas processed counts (sample 2)
EH2703 | Atlas processed counts (sample 3)
EH2704 | Atlas processed counts (sample 4)
EH2705 | Atlas processed counts (sample 5)
... ...
EH3020 | WT chimera raw counts (sample 6)
EH3021 | WT chimera raw counts (sample 7)
EH3022 | WT chimera raw counts (sample 8)
EH3023 | WT chimera raw counts (sample 9)
EH3024 | WT chimera raw counts (sample 10)
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
I think this package is now ready to go!
I have noticed this morning that I have accidentally duplicated the rowData on ExperimentHub - each sample has its own file, but the contents of the file are actually the same for every sample.
We could clean this up, but given that the files are so small (two columns of ~25k gene names) it might not even be worth the time to do this. I'll leave the decision up to you, as it wouldn't much of a pain on my end.
Thanks, Jonny.
Hi @jonathangriffiths ,
It's nice updating the whole data sets on ExperimentHub, so each small data set could be biologically meaningful and apply to all platforms.
I would say only keep one of the rowData on ExperimentHub, and call the same rowData in R scripts. I'll see what Lori @lshep says about it.
Other than that, the package looks good!
Best, Qian
@lshep - I will generate replacement row data files tomorrow morning. If we go ahead with deleting the duplicated data, please could you remove any files that contain the string "rowdata"? That should apply only to the files we want removed, and there should be 36 in the atlas folder, 10 in the wild-type chimera folder, and 4 in the Tal1 chimera folder.
If it's easier, we could delete all the files and replace them wholesale - the upload time isn't very long on my end.
Thanks, Jonny.
I will remove these files from the database and S3. Please let me know when you have the replacement files uploaded to add back in.
Hi @lshep - these should be uploaded now (just the row data, in the right place in their appropriate file structure)
Jonny
So I should be readding three files - I assume these are corrected in the metadata files as well?
Lori Shepherd
Bioconductor Core Team
Roswell Park Cancer Institute
Department of Biostatistics & Bioinformatics
Elm & Carlton Streets
Buffalo, New York 14263
From: Jonathan Griffiths notifications@github.com Sent: Tuesday, August 6, 2019 12:16:36 PM To: Bioconductor/Contributions Contributions@noreply.github.com Cc: Shepherd, Lori Lori.Shepherd@RoswellPark.org; Mention mention@noreply.github.com Subject: Re: [Bioconductor/Contributions] MouseGastrulationData (#1150)
Hi @lshephttps://github.com/lshep - these should be uploaded now (just the row data, in the right place in their appropriate file structure)
Jonny
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Bioconductor/Contributions/issues/1150?email_source=notifications&email_token=AEO3MHCYUU6BWY75LNMQ3QDQDGPWJA5CNFSM4HYYXS5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3VVPPI#issuecomment-518739901, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEO3MHB6QK6ULRV26TL5LZTQDGPWJANCNFSM4HYYXS5A.
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
The rowData has been added back into the hub.
> library(ExperimentHub)
> eh = ExperimentHub()
> query(eh, c("MouseGast", "rowData"))
ExperimentHub with 3 records
# snapshotDate(): 2019-08-06
# $dataprovider: Jonathan Griffiths
# $species: Mus musculus
# $rdataclass: character
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH3065"]]'
title
EH3065 | Atlas rowData
EH3066 | Tal1 chimera rowData
EH3067 | WT chimera rowData
Thanks for getting these up! However, I'm getting an error when I try to download the new rowData:
> library(ExperimentHub)
> eh = ExperimentHub()
snapshotDate(): 2019-08-06
> head(eh[["EH3065"]])
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
downloading 1 resources
retrieving 1 resource
Downloading: 240 B
Error: failed to load resource
name: EH3065
title: Atlas rowData
reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
web resource path: ‘https://experimenthub.bioconductor.org/fetch/3081’
local file path: ‘/Users/griffi01/Library/Caches/ExperimentHub/59211b2f6988_3081’
reason: Forbidden (HTTP 403).
2: bfcadd() failed; resource removed
rid: BFC5
fpath: ‘https://experimenthub.bioconductor.org/fetch/3081’
reason: download failed
3: download failed
hub path: ‘https://experimenthub.bioconductor.org/fetch/3081’
cache resource: ‘EH3065 : 3081’
reason: bfcadd() failed; see warnings()
This seems to apply to all three new files. The old ones work fine, though. Do you know what seems to have upset things, @lshep?
Thanks, Jonny
Apologies I forgot to make the resources public. They should be accessible now.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Thanks @lshep!
I think everything is truly ready to go now, @Liubuntu! The only change I have made since your previous review was to add one further exported object, which summarizes some metadata for each 10x sample in the atlas (AtlasSampleMetadata
).
Thanks, Jonny
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.