drisso / archive-scRNAseq

Archived version of the scRNAseq Bioconductor package. All future development will be done at https://github.com/LTLA/scRNAseq
4 stars 3 forks source link

Opening it up as an EHub data package #10

Closed LTLA closed 5 years ago

LTLA commented 5 years ago

simpleSingleCell uses quite a few data sets (Zeisel brain, four pancreas, mammary gland stuff), many of which are probably generally useful to other people. I can also imagine other situations where people have un-mangled public data and want a nice standard place to store it - for their own reference, and for other people to easily use without having to go through the same pain.

scRNAseq seems like it could be such a resource, if you would be inclined to make it an ExperimentHub-dependent package and to open it up to accept contributions from the community. I, for one, would be more than pleased to migrate code from simpleSingleCell into scRNAseq to centralize SC-data management operations here. If not, I'll just start my own miscRNAseq package (for miscellaneous single-cell RNA-seq, geddit)... but I would rather not make yet another package.

drisso commented 5 years ago

Hi Aaron,

This is a great idea! I’m more than happy to do both things. I’ll look into how to turn it into a ExperimentHub package, because I’m not too familiar with the process.

LTLA commented 5 years ago

Great. How about I do one to get started (probably Zeisel brain), and you can just follow my lead on converting the data sets you already have. Might have to talk to @lshep about the process of migrating resources out of data/ and onto EHub.

drisso commented 5 years ago

Awesome! Thanks Aaron!

LTLA commented 5 years ago

Got the party started with #11.

It's probably worth thinking about the scope of the data sets we want to put in at the start. I would suggest that the greatest value for money is obtained by focusing on data sets that are:

The full Zeisel dataset is a good example of the first point, while both the Zeisel and Segerstolpe datasets are good examples of the second point.

drisso commented 5 years ago

Thanks Aaron!

I forgot, is the plan to move all the existing datasets to the hub or should they co-exist as rda files?

If the former, should I get started in porting those?

LTLA commented 5 years ago

I think we'd want them to eventually live in the Hub, so yeah, you should start to port them. However, this will involve a period of deprecation where both options are provided. I don't know how we'll kill the Rdata files, it's hard to put in deprecation warnings for data().

LTLA commented 5 years ago

Deprecation is easily achieved. To illustrate with allen:

  1. Move data/allen.rda somewhere else, e.g., inst/deprecated.
  2. Make data/allen.R, containing:
    .Deprecated(msg="'data(allen)' is deprecated.\nUse AllenBrainData() instead.")
    load(system.file("deprecated", "allen.rda", package="scRNAseq"))
  3. Reinstall the package and use data(allen) as usual.

Obviously, this would suggest that you have written the AllenBrainData() function...

LTLA commented 5 years ago

https://github.com/LTLA/scRNAseq, which uses the Bioconductor Git history.