There are many single-cell data sets that we want to load into Gemma that are not available in GEO. These typically come from random web sites, not a particular repository.
We have some antiquated support for this already, both from the CLI and GUI, but it needs to be revisited and probably updated.
These were designed with microarrays in mind, and for data that comes as a single tab-delimited file.
Also note we have methods for loading experimental design information from files as well (ExperimentalDesignImporter) but it is limited too. For uploading meta-data on samples we'll need something like this.
We'll need to adapt these to facilitate loading of single-cell data.
In general, there are three steps, after which datasets should be able to be processed "as usual".
Definition of the basic data set information (name, description etc.) - the upload form is not a bad way to do this but it will need to be updated a little. Probably the uploading of data itself should be separated from that step completely.
Loading of data files, and probably supporting some other format besides tsv (we need to see what makes sense). Since we support this already for single-cell, this part should be easy.
Uploading of meta-data on samples if available, to save data entry time.
We'll flesh this out with some particular examples.
There are many single-cell data sets that we want to load into Gemma that are not available in GEO. These typically come from random web sites, not a particular repository.
We have some antiquated support for this already, both from the CLI and GUI, but it needs to be revisited and probably updated.
https://gemma.msl.ubc.ca/expressionExperiment/upload.html (ExpressionDataFileUploadController) LoadSimpleExpressionDataCli
These were designed with microarrays in mind, and for data that comes as a single tab-delimited file.
Also note we have methods for loading experimental design information from files as well (ExperimentalDesignImporter) but it is limited too. For uploading meta-data on samples we'll need something like this.
We'll need to adapt these to facilitate loading of single-cell data.
In general, there are three steps, after which datasets should be able to be processed "as usual".
We'll flesh this out with some particular examples.