Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

MouseGastrulationData #1150

Closed jonathangriffiths closed 5 years ago

jonathangriffiths commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @jonathangriffiths

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: MouseGastrulationData
Title: Single-Cell Transcriptomics Data across Mouse Gastrulation and Early Organogenesis
Version: 0.99.0
Date: 2019-06-17
Authors@R: c(
    person("Jonathan", "Griffiths", email = "jonathan.griffiths.94@gmail.com", role = c("aut", "cre")),
    person("Aaron", "Lun", email = "infinite.monkeys.with.keyboards@gmail.com", role = "aut"))
Description: 
    Provides processed and raw count matrices for single-cell RNA sequencing data 
    from a timecourse of mouse gastrulation and early organogenesis.
Depends: 
    SingleCellExperiment
Imports:
    methods,
    ExperimentHub,
    BiocGenerics,
    S4Vectors
Suggests: 
    BiocStyle, 
    knitr, 
    rmarkdown
VignetteBuilder: 
    knitr
License: GPL-3
NeedsCompilation: no
Encoding: UTF-8
biocViews: ExperimentData, ExpressionData, SequencingData, RNASeqData, SingleCellData
RoxygenNote: 6.1.1

Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.

jonathangriffiths commented 5 years ago

A note about the unchecked box: most functions in the package use dontrun{}, and the package therefore fails BiocCheck.

However, this is because the associated files are not yet available on ExperimentHub, so the examples cannot be run. The files that the package uses are uploaded, however, using information provided by @lshep.

Once the files are available we will update the package to remove these.

bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 5 years ago

I see the data on S3. I unfortunately cannot add the data into the database for the experimenthub until the repository has the inst/extdata/metadata.csv files created. Could you please run the scripts in inst/scripts to create the three different metadata files. Once they are checked into the repository please tag me in a comment here.

jonathangriffiths commented 5 years ago

I have just pushed that now @lshep, thanks!

lshep commented 5 years ago

Please make sure to add ExperimentHub to the biocViews list in your DESCRIPTION. The data has been added to the hub and you should now be able to test and debug with your data.

library(ExperimentHub)
hub = ExperimentHub()
query(hub, "MouseGastrulationData")
ExperimentHub with 65 records
# snapshotDate(): 2019-06-18 
# $dataprovider: Jonathan Griffiths
# $species: Mus musculus
# $rdataclass: character
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["EH2583"]]' 

           title                            
  EH2583 | Atlas processed counts           
  EH2584 | Atlas rowData                    
  EH2585 | Atlas colData                    
  EH2586 | Atlas size factors               
  EH2587 | Atlas reduced dimensions         
  ...      ...                              
  EH2643 | WT chimera raw counts (sample 6) 
  EH2644 | WT chimera raw counts (sample 7) 
  EH2645 | WT chimera raw counts (sample 8) 
  EH2646 | WT chimera raw counts (sample 9) 
  EH2647 | WT chimera raw counts (sample 10)

Let me know if there is any other issues with accessing the data from the hub.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

0806d0a Version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

0395532 Added subsampling for processed data; fixed vignet...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

6e98607 Push to trigger build

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

d4f8b5c Removed seed setting from functions 316437a Added R >=2.10 dependency as per R CMD CHECK

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

1b03561 Version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

jonathangriffiths commented 5 years ago

Okay, I believe this package is now in a ready-to-submit state.

The vignette build fails on the Windows machine due to the restricted amount of memory. Given that this is a package designed to facilitate the download and in-memory use of a large dataset, this does not seem to me to be a severe issue.

I have considered separating the largest objects into smaller files (split according to some sensible scheme like the 10x sample) - these could then be separately downloaded and loaded into R to reduce the memory burden. However, for the moment, I imagine it is better to get this package up and available for use rather than delay to edit the raw data files.

There is also a warning in R CMD CHECK asking for an R >= 2.1.0 dependency. Of course, this is implicitly satisfied if the user is able to use a version of Bioconductor version that could contain this package. Do you know what the best thing to do is regarding this?

Thanks!

Liubuntu commented 5 years ago

Hi @jonathangriffiths ,

For your last question about R dependency, you can simply add Depends: R (>= 3.6.0) in DESCRIPTION file, Depends field.

I'll take a look at your other building issues and comment back later.

Best, Qian

Liubuntu commented 5 years ago

HI @jonathangriffiths ,

Please see below for the review. Fixing warnings and errors from the building report were suggested also. Let me know for any questions.

Cheers, Qian

DESCRIPTION

NAMESPACE

R/

vignette

1. Download the package from Bioconductor.
{r getPackage, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("pkgName")

Or install the development version of the package from Github.
{r, eval = FALSE}
BiocManager::install(“githubUserName/pkgname”)

2. Load the package into R session.
{r Load, message=FALSE}
library(pkgName)

inst/scripts

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

7aeac27 Added documentation changes suggested by Bioconduc... 036267e Changes to code and function documentation for Bio... 44680d0 R version dependency change

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

87a7a71 Version bump for Bioc build

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

jonathangriffiths commented 5 years ago

I have now addressed all of the review points, except for the memory limit one.

I think this is not addressable at the moment due to the wat the data is currently stored on ExperimentHub - raw count matrix, when loaded into R, is 7.4 GB in memory. Therefore, even before the downsampling can happen, the memory threshold should be hit. There is also the point that Aaron has raised above, but I'm uncertain if that is playing a role in the problem here.

The "real" solution is probably to reorganise the files on ExperimentHub into smaller chunks (e.g., by developmental stage) so that we can reduce the amount of data that needs to be loaded at all. Indeed, I do plan to do this at some point, but I think it's probably better to get the package up and running for the much larger community that use high-memory 64-bit machines before doing this restructuring.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

968c847 Switched celltype colours to exported object 0f6597e version bump d6a5f10 Removed old colour rda file 970e525 Removed rogue data() calls c0333dd Typo fix

jonathangriffiths commented 5 years ago

I made a tweak this morning to move a small data file from a data/.rda file to an R/*.R script, to allow easier reading from the code.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

Liubuntu commented 5 years ago

Hi @jonathangriffiths ,

After inspecting the error a bit further, we see that the data are pretty big in ExperimentHub, and the people who uses win 32 could not download the data successfully and then construct the SingleCellExperiment object. Also for documentation examples / vignette examples, we don't expect author to have a very large data set for showcase.

So I talked with Martin @mtmorgan, and he suggests to add another function like EmbryoAtlasDataExample(), which does the same in constructing the SCE, but the data only includes e.g., 2 samples, for showcase only. So in any operating system, this functionality of the real function EmbryoAtlasData() could be successfully showed, and the @examples doesn't occupy a lot of time given the building time limit.

Things to do include:

  1. generate the small data set and submit to ExperimentHub.
  2. Add a new function e.g., EmbryoAtlasDataExample(), and do the EH downloading (small dataset), and SCE constructing, etc.
    1. export this function, and add it into the @examples field under EmbryoAtlasData().
    2. Inside the documentation of EmbryoAtlasData (or in the @examples field, add some comments about the data size, and remind users to be cautious in calling this big function if only for testing purpose.

Let me know for any question.

Best, Qian

jonathangriffiths commented 5 years ago

Hi @Liubuntu,

If we're going to add some new data to ExperimentHub, I think it's probably best that I push ahead with a larger restructuring that allows the access of smaller subsets of data in a way that is biologically meaningful. This should be helpful not only for the vignette but also for researchers who want to access only a slice of the data. I'll have a go at this over the weekend.

Thanks, Jonny

jonathangriffiths commented 5 years ago

Hi @lshep,

I've generated the new files for this package now - should I upload them using the same details as for the first file submission?

I think you can safely delete all of the files that had already been uploaded, too.

Thanks for the help, Jonny.

lshep commented 5 years ago

Sorry for the delayed response; I was away at a conference with limited internet connectivity. Yes please upload as before. I will need the updated metadata file to add to the database.

Just to double verify - all the original data uploaded can be deleted from S3 and out of the ExperimentHub database?

Cheers,

Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Jonathan Griffiths notifications@github.com Sent: Sunday, July 21, 2019 4:57:41 PM To: Bioconductor/Contributions Contributions@noreply.github.com Cc: Shepherd, Lori Lori.Shepherd@RoswellPark.org; Mention mention@noreply.github.com Subject: Re: [Bioconductor/Contributions] MouseGastrulationData (#1150)

Hi @lshephttps://github.com/lshep,

I've generated the new files for this package now - should I upload them using the same details as for the first file submission?

I think you can safely delete all of the files that had already been uploaded, too.

Thanks for the help, Jonny.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Bioconductor/Contributions/issues/1150?email_source=notifications&email_token=AEO3MHHF43RFU63VYVOHCGTQATEULA5CNFSM4HYYXS5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2OLOTI#issuecomment-513587021, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEO3MHHLGJ4YITUXTDG5AKLQATEULANCNFSM4HYYXS5A.

This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.

jonathangriffiths commented 5 years ago

No problem!

Yes, everything else can be deleted. I will push the metadata csvs and upload the data files to AWS tomorrow.

Jonny.

jonathangriffiths commented 5 years ago

Hi @lshep,

The files should all be uploaded now.

The metadata files are also updated at https://github.com/MarioniLab/MouseGastrulationData/tree/master/inst/extdata

Thanks for helping me sort this out,

Jonny

lshep commented 5 years ago

The new data has been added to the hub. Please update your package as necessary

> eh = ExperimentHub()
  |======================================================================| 100%

snapshotDate(): 2019-08-01
> query(eh, "MouseGas")
ExperimentHub with 324 records
# snapshotDate(): 2019-08-01 
# $dataprovider: Jonathan Griffiths
# $species: Mus musculus
# $rdataclass: character
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["EH2701"]]' 

           title                            
  EH2701 | Atlas processed counts (sample 1)
  EH2702 | Atlas processed counts (sample 2)
  EH2703 | Atlas processed counts (sample 3)
  EH2704 | Atlas processed counts (sample 4)
  EH2705 | Atlas processed counts (sample 5)
  ...      ...                              
  EH3020 | WT chimera raw counts (sample 6) 
  EH3021 | WT chimera raw counts (sample 7) 
  EH3022 | WT chimera raw counts (sample 8) 
  EH3023 | WT chimera raw counts (sample 9) 
  EH3024 | WT chimera raw counts (sample 10)
bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2d834b2 Removed subsample.frac d8d6399 Removed some hiding subsample.fracs 3342045 Moved code to use sample indices for the processed... 3341afd Version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

jonathangriffiths commented 5 years ago

I think this package is now ready to go!

I have noticed this morning that I have accidentally duplicated the rowData on ExperimentHub - each sample has its own file, but the contents of the file are actually the same for every sample.

We could clean this up, but given that the files are so small (two columns of ~25k gene names) it might not even be worth the time to do this. I'll leave the decision up to you, as it wouldn't much of a pain on my end.

Thanks, Jonny.

Liubuntu commented 5 years ago

Hi @jonathangriffiths ,

It's nice updating the whole data sets on ExperimentHub, so each small data set could be biologically meaningful and apply to all platforms.

I would say only keep one of the rowData on ExperimentHub, and call the same rowData in R scripts. I'll see what Lori @lshep says about it.

Other than that, the package looks good!

Best, Qian

jonathangriffiths commented 5 years ago

@lshep - I will generate replacement row data files tomorrow morning. If we go ahead with deleting the duplicated data, please could you remove any files that contain the string "rowdata"? That should apply only to the files we want removed, and there should be 36 in the atlas folder, 10 in the wild-type chimera folder, and 4 in the Tal1 chimera folder.

If it's easier, we could delete all the files and replace them wholesale - the upload time isn't very long on my end.

Thanks, Jonny.

lshep commented 5 years ago

I will remove these files from the database and S3. Please let me know when you have the replacement files uploaded to add back in.

jonathangriffiths commented 5 years ago

Hi @lshep - these should be uploaded now (just the row data, in the right place in their appropriate file structure)

Jonny

lshep commented 5 years ago

So I should be readding three files - I assume these are corrected in the metadata files as well?

Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Jonathan Griffiths notifications@github.com Sent: Tuesday, August 6, 2019 12:16:36 PM To: Bioconductor/Contributions Contributions@noreply.github.com Cc: Shepherd, Lori Lori.Shepherd@RoswellPark.org; Mention mention@noreply.github.com Subject: Re: [Bioconductor/Contributions] MouseGastrulationData (#1150)

Hi @lshephttps://github.com/lshep - these should be uploaded now (just the row data, in the right place in their appropriate file structure)

Jonny

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Bioconductor/Contributions/issues/1150?email_source=notifications&email_token=AEO3MHCYUU6BWY75LNMQ3QDQDGPWJA5CNFSM4HYYXS5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3VVPPI#issuecomment-518739901, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEO3MHB6QK6ULRV26TL5LZTQDGPWJANCNFSM4HYYXS5A.

This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.

lshep commented 5 years ago

The rowData has been added back into the hub.

> library(ExperimentHub)
> eh = ExperimentHub()
> query(eh, c("MouseGast", "rowData"))
ExperimentHub with 3 records
# snapshotDate(): 2019-08-06 
# $dataprovider: Jonathan Griffiths
# $species: Mus musculus
# $rdataclass: character
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["EH3065"]]' 

           title               
  EH3065 | Atlas rowData       
  EH3066 | Tal1 chimera rowData
  EH3067 | WT chimera rowData  
jonathangriffiths commented 5 years ago

Thanks for getting these up! However, I'm getting an error when I try to download the new rowData:

> library(ExperimentHub)
> eh = ExperimentHub()
snapshotDate(): 2019-08-06
> head(eh[["EH3065"]])
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
downloading 1 resources
retrieving 1 resource
Downloading: 240 B
Error: failed to load resource
  name: EH3065
  title: Atlas rowData
  reason: 1 resources failed to download
In addition: Warning messages:
1: download failed
  web resource path: ‘https://experimenthub.bioconductor.org/fetch/3081’
  local file path: ‘/Users/griffi01/Library/Caches/ExperimentHub/59211b2f6988_3081’
  reason: Forbidden (HTTP 403).
2: bfcadd() failed; resource removed
  rid: BFC5
  fpath: ‘https://experimenthub.bioconductor.org/fetch/3081’
  reason: download failed
3: download failed
  hub path: ‘https://experimenthub.bioconductor.org/fetch/3081’
  cache resource: ‘EH3065 : 3081’
  reason: bfcadd() failed; see warnings()

This seems to apply to all three new files. The old ones work fine, though. Do you know what seems to have upset things, @lshep?

Thanks, Jonny

lshep commented 5 years ago

Apologies I forgot to make the resources public. They should be accessible now.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2fe2d05 Whitespace improvements 6536a0a Version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

jonathangriffiths commented 5 years ago

Thanks @lshep!

I think everything is truly ready to go now, @Liubuntu! The only change I have made since your previous review was to add one further exported object, which summarizes some metadata for each 10x sample in the atlas (AtlasSampleMetadata).

Thanks, Jonny