BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
320 stars 58 forks source link

Inconsistent cell type propotion between deconvoluted Visium and paired 10X Chromium #352

Open kuang-da opened 8 months ago

kuang-da commented 8 months ago

Hi,

Thank you for the great software!

I wonder if the consistency of cell type proportion can be used to evaluate the success of cell type deconvolution in my study. I observe the following result from my experiment. I expect the proportion of the estimated cell type abundance to be close to the paired reference since the samples are collected from the same donors' organs. However, there is an extreme overestimation of Smooth muscle and Stromal fibroblast.

Cell Type scRNA-seq Cell2location Visium
Glandular epithelium 0.6430 0.2757
Ciliated epithelium 0.1548 0.0327
Stromal fibroblast 0.0600 0.1677
Macrophage 0.0409 0.0214
T/NK cell 0.0337 0.0128
Lymphatic endothelium 0.0237 0.0109
Smooth muscle 0.0210 0.3533
Blood endothelium 0.0174 0.0560
Pericyte 0.0031 0.0625
B cell 0.0023 0.0070

There are ~37k cells in the scRNA-seq dataset (10X Chromium), and the cell types are defined by experts based on the markers of clusters of 10X Chromium. So, I think we have enough cells for each cell type for cell2location to do deconvolution. Moreover, there are 16 Visium slides, and ~ 29k spots are measured. When running cell2location, I use n=20 and alpha=20.

The cell type proportion for cell2location is calculated based on q95_cell_abundance_w_sf.

df = adata_vis_pos.obsm['q95_cell_abundance_w_sf']
df_sum = df.sum(axis=0)
df_sum/df_sum.sum()

I wonder if there is a way to adjust the pipeline to make the estimated proportion more reliable, i.e. being closer to the reference dataset, if that makes sense.

Thank you for your time!

vitkl commented 8 months ago

Hi @kuang-da

Visium does biased tissue sampling because the hexagonal grid doesn't perfectly overlap with anatomy. In addition, Visium uses thin 5-10um sections - in contrast, the extraction of cells from tissue blocks or nuclei from much thicker tissue sections (100-200um) does a much more robust sampling of the anatomical structures. We think this is why there are discrepancies between the proportions of nuclei isolated from frozen sections and the proportions of cells across Visium sections.

kuang-da commented 8 months ago

Thank you for the reply and the great insight. I agree that differences processing procedures can lead to inconsistent proportion between scRNAseq and Visium.

I have tried rerun Cell2location with n=20, alpha=200 and also run TACCO. The results are as follows. The discrepancy between cell2location and TACCO is interesting.

scRNA-seq Visium-Cell2location-20 Visium-Cell2location-200 Visium-TACCO
Glandular epithelium 0.6430 0.2757 0.2534 0.4319
Ciliated epithelium 0.1548 0.0327 0.0362 0.2834
Stromal fibroblast 0.0600 0.1677 0.1553 0.0921
Macrophage 0.0409 0.0214 0.0315 0.0477
T/NK cell 0.0337 0.0128 0.0368 0.0236
Lymphatic endothelium 0.0237 0.0109 0.0255 0.0418
Smooth muscle 0.0210 0.3533 0.3046 0.0284
Blood endothelium 0.0174 0.0560 0.0621 0.0391
Pericyte 0.0031 0.0625 0.0763 0.0079
B cell 0.0023 0.0070 0.0183 0.0041
vitkl commented 7 months ago

Is your scRNA-seq done by enzymatic tissue dissection or by nuclei isolation from frozen tissue?

Cell abundance differences could be hard to distinguish from technical RNA detection differences across genes.

Hard to comment on TACCO because I see that it mentions Visium mostly in passing and for many tissues Visium contains complex cell-type mixtures. I have not read TACCO in depth but the use of optimal transport can lead to encouraging correspondent cell abundances by design. If that's indeed the assumption of TACCO you need to assess if that is reasonable for your data.

Also, do the conclusions of your paper depend on recovering the absolute difference between cell types?

kuang-da commented 7 months ago

I agree that OT would encourage correspondent cell abundances by design.

Is your scRNA-seq done by enzymatic tissue dissection or by nuclei isolation from frozen tissue?

Our data is done by nuclei isolation from frozen tissue. So it's snRNA-seq.