chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
63 stars 12 forks source link

Remove dependency on sceasy for AnnData->Seurat conversion #6672

Closed metakuni closed 2 months ago

metakuni commented 7 months ago

Motivation

Also review the potential of schard as a replacement for Seurat conversions. It is reticulate-free and builds on BIoconductor rhdf5.


Currently, we depend on sceasy for AnnData -> Seurat conversion:

https://github.com/chanzuckerberg/single-cell-data-portal/blob/main/backend/layers/processing/make_seurat.R#L16

Since sceasy development appears to have been inactive for ~1.5 years, the conversion code should be imported into our repository and unit tests should be added.

Seurat conversion bugs could have been avoided with basic unit tests. Examples:

Upcoming dataset schema changes will also include changes that should be covered by Seurat conversion unit tests. (e.g. spatial data)

Definition of Done

Bento007 commented 6 months ago

@metakuni since dataset with spacial will not use seurat conversion is this still a P0?

Bento007 commented 5 months ago

suggest by @danieljhegeman

Are we considering allocating time to overhaul our seurat pipeline, perhaps by getting away from R and writing seurat files using something like this?

This library only support pandas data frames so far.

Bento007 commented 5 months ago

another reticulate bases R library named anndata for reading anndata

Bento007 commented 5 months ago

schard is still early stage development and doesn't have a version out yet we can lock too. We'd need to pin to a commit. Schard also doesn't support the uns field. Here is an example of how we can convert an h5ad to seurat using schard.

install.packages('devtools')
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("rhdf5")
devtools::install_github("cellgeni/schard")
# load h5ad as Seurat
converted = schard::h5ad2seurat('/example_valid.h5ad')
saveRDS(converted, './example_valid.rds')
Bento007 commented 5 months ago

testthat can be used for testing

Bento007 commented 5 months ago

We are deferring this work for H1.

metakuni commented 3 months ago

Deprioritizing this until further notice.