digitalcytometry / cytospace

CytoSPACE: Optimal mapping of scRNA-seq data to spatial transcriptomics data
Other
115 stars 19 forks source link

possibility of sparse matrix support? #74

Closed mcaponegro closed 1 year ago

mcaponegro commented 1 year ago

Hi, I really like Cytospace for cell type label transfer from single cell to spatial data. Thank you for the package. One question I had is if there is planned support to read/write sparse matrices instead of using a dense matrix with write.table() ? Some single cell and ST data can get very large, and writing out these files can take a lot of time/memory. I have applied sketch sampling for single cell reference data, but this alternative will not work when processing new ST data.

I tried editing write.table() with fwrite() in generate_cytospace_from_scRNA_seurat_object() and generate_cytospace_from_ST_seurat_object(), however, this threw an error in the cytospace.py read_file() function (even if setting delim = manually or to "auto")

It seems that all the pieces are there and I was curious if you have thoughts about this.

I also appreciate the simplicity of the code and availability of the functions within the package.

Thanks, m

hsjeon-k commented 1 year ago

Hi, thank you for the suggestion!

I think this is a great idea considering the size of some single-cell/ST datasets as you mentioned, and it's also something I have been thinking about over the last few updates. While we don't have a specific timeline in mind, I can look into this over the next couple weeks to see whether this would be possible. I think the package that we are currently using for reading in files might not support sparse matrices (which is probably why you ran into the error), so we might need to make some additional changes to implement this feature.

We will certainly let you know if we start supporting sparse matrices as input files! Thank you once again for your input, and please feel free to let us know if you have further thoughts.

hsjeon-k commented 1 year ago

Hello!

Sparse matrix support was added starting with CytoSPACE v1.0.5. We added a write_sparse boolean parameter in the R helper functions, which should generate the inputs in sparse matrix format if set as TRUE. But if you would prefer to supply your own sparse matrix files, we outlined the format in the documentation under Input files - Using sparse matrices for gene expression as well.

Thank you for the suggestion once again!

mcaponegro commented 1 year ago

Excellent! thank you for your support. I am looking forward to trying it