BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
324 stars 58 forks source link

Re-scale of cell type abundances #311

Open scg-dgist opened 1 year ago

scg-dgist commented 1 year ago

... Hello,

I want to express my appreciation for the development of such an outstanding tool for deconvoluting spatial transcriptomics data. I have successfully calculated the abundance of cell types in each spot of the Visium data by utilizing a scRNA-seq reference. However, while reviewing the deconvoluted output, I observed that the numeric ranges of cell type abundance differ across various cell types. Is it possible to rescale the abundance of each cell type within each spot to a standardized range of 0 to 1? To clarify, if a spot contains three different cell types, I aim to obtain results similar to this: Cell Type 1: 0.5, Cell Type 2: 0.3, Cell Type 3: 0.2 (the sum of cell type abundances across all cell types == 1).

Thank you for your assistance.

vitkl commented 1 year ago

Cell2location estimates cell abundance on a positive abundance scale by leveraging reference expression signatures, cell number prior and absolute scale of RNA counts in the data. This way estimated cell abundance has an absolute scale and should be relatively similar to the number of cells under each Visium location. Normalising all abundance values to sum to one per location is not necessary and can be misleading in cases where tissue has large variability in the number of cells (eg epithelial layer in skin vs dermis). Still, if you really need this, you can simply divide cell abundance by total cell abundance per location.

scg-dgist commented 1 year ago

Cell2location estimates cell abundance on a positive abundance scale by leveraging reference expression signatures, cell number prior and absolute scale of RNA counts in the data. This way estimated cell abundance has an absolute scale and should be relatively similar to the number of cells under each Visium location. Normalising all abundance values to sum to one per location is not necessary and can be misleading in cases where tissue has large variability in the number of cells (eg epithelial layer in skin vs dermis). Still, if you really need this, you can simply divide cell abundance by total cell abundance per location.

Thank you for your explanation. Based on your clarification, for mapping each spot in the 10x Visium data to the cell types used in the scRNA-seq to extract reference expression signatures, should I proceed by simply selecting the maximum value in the 'q05_cell_abundance_w_sf' data frame for each spot, or is there any additional processing required? I understand that Visium spots can contain a mixture of multiple cell types, but I need to define each spot as the most predominant cell type within it for further analysis of the spatial transcriptome data.

vitkl commented 1 year ago

but I need to define each spot as the most predominant cell type within it for further analysis of the spatial transcriptome data.

This is scientifically incorrect. You need to rethink how you do "further analysis of the spatial transcriptome data". What are your research goals? Start from there and ask how to address these goals while accounting for that Visium a mixture of multiple cell types.