chanzuckerberg / single-cell-data-portal

The data portal supporting the submission, exploration, and management of projects and datasets to cellxgene.
MIT License
62 stars 12 forks source link

Investigate missing data@assays$RNA@key in Seurat conversion #2775

Closed brianraymor closed 2 years ago

brianraymor commented 2 years ago

@brianraymor commented on Thu Jun 02 2022

On wrangling, Michael Czerwinski reported:

There seems to be an issue with seurat objects downloaded as .rds files from cellxgene. They (at least some) are missing the key for the RNA assay type that lets seurat know there is RNA data in the object. It seems to not be a problem much of the time, but if you try to subset a freshly loaded object, it tells you there is nothing in the subset until you manually add the key back in. Datasets that I had this issue with: Krasnow Lab Human Lung Cell Atlas, 10X Spatiotemporal analysis of human intestinal development at single-cell resolution: Pericytes Healthy human liver: hepatic stellate cells

Workaround is to add the key back in manually: data1@assays$RNA@key <- "rna_"

I figured out the workaround based on this issue: https://github.com/satijalab/seurat/issues/5676#issuecomment-1111427376


@brianraymor commented on Thu Jun 02 2022

@pablo-gar commented:

sceasy is supposed to add it by this: Seurat::Key(assays[[assay]]) <- paste0(tolower(assay), "_") But the assignment is not happening for some reason.


@brianraymor commented on Thu Jun 02 2022

I wonder if Seurat versions are playing a role here per the comments in the issue that Michael referenced: Downgrading to 4.0.2 solves this problem, as described in timoast/signac#872 (comment)


@brianraymor commented on Fri Jun 24 2022

Moving to single-cell-data-portal. The sceasy conversion is part of the ingestion pipeline.

stephanmg commented 2 years ago

Hello @brianraymor,

I can confirm the same problem with R 4.2.1 "Funny-Looking Kid" on OSX. Seurat version: ‘4.1.1’ Problematic data set: Pathogenic variants damage cell composition and single cell transcription in cardiomyopathies

.rds (Seurat v3 objects) files from Cellxgene.

brianraymor commented 2 years ago

Thanks for the report @stephanmg.

Per conversation with @metakuni, adding investigation/resolution to the pending updates to the ingestion pipeline in Q3.

ebezzi commented 2 years ago

I tested an AnnData -> RDS conversion using the latest Seurat (4.1.1) and the field is populated correctly. We're gonna try to upgrade our dependencies in the conversion script and this should be fixed.