PavlidisLab / Gemma

Genomics data re-analysis
Apache License 2.0
23 stars 6 forks source link

Matrix writer for single cell data #1247

Open arteymix opened 3 days ago

arteymix commented 3 days ago

We need to have tools and writers for single cell data.

In particular, this is important for running analysis outside of Gemma such as cell type inference.

Writing MEX as a TAR is inefficient if we break it down by sample because we have to keep the whole matrix in memory and revisit it for every sample. If we manage to write it to disk, we can write all the MTX files at once.

I don't believe it is worth at this time to write the HDF5 formats to disk.

arteymix commented 1 day ago

I've added location for SC vectors alongside other files we write to disk and cleaned up the service that does that.

I think it would be more efficient to write vectors to disk while they are still in-memory at the moment we load them.