Real-world Single Cell Methylation Data Download

Epigenomics-Screw / Screw

SCREW: A Reproducible Workflow for Single-Cell Epigenomics

MIT License

7 stars 7 forks source link

Real-world Single Cell Methylation Data Download #10

Closed bdecato closed 7 years ago

bdecato commented 7 years ago

Description

As a bioinformatician, I would like to download the single cell methylation data produced in Farlik et al. (2016) from the Gene Expression Omnibus, in order to facilitate downstream real-world application of our pipeline.

ESTIMATE:

3 hours.

PRIORITY:

SHOULD

NOTES:

I've removed the Smallwood et al. data download story and updated the Farlik et al. story in this issue to cover the 2016 dataset, rather than the 2015 dataset. I think this is the one we should focus on for the time being. -Ben

oneillkza commented 7 years ago

So one thing that's occurring to me is that we should probably make the output directory an explicit (rather than implicit) input variable to all our tools (especially the gigantic download ones).

oneillkza commented 7 years ago

I've uploaded this to a separate repo using LFS, so am marking this done. Still to do is to actually write code to use it.

https://github.com/Epigenomics-Screw/Farlik_2016_Example

neksa commented 7 years ago

Thanks! What's the total data size? I can perhaps pack it zipped into a data container and mount it?

oneillkza commented 7 years ago

About 500MB unzipped, 110MB zipped (IIRC). I'm not sure which is going to be the better way of distributing it -- GitHub LFS or Docker data container, but let's maybe give that a try.

I've also been equivocating about zipping or not, but I guess step one of the example workflow could be a simple CWL wrapper around tar.