Cell type proportion per spot overrepresenting transcriptionally active cell types?

Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.

Problem

I would like to estimate the proportion of cell types in each spot. I have the generated cell abundance metric, however certain cell types that are more transcriptionally active appear to be overrepresented in my proportions - in the context of brain, this comes across as more glial cells (astrocytes, migrolia) and very few neurons. Your recommendation in the discussion was to _"by taking cell abundance of all cell types (as in the tutorial plotting section), computing total cell abundance per location, and dividing values of individual cell types"_ - I wanted to check I understood this correctly: For a given spot I summed the confidence value for all cell types in a given spot, then took the ratio of the confidence of one cell e.g. B plasma, in relation to this sum (confidence Bplasma/confidence total cell types). Is this the correct approach? If so, does this not guarantee that certain cell types will be overrepresented given the significantly distinct scale bar for each cell types (as made evident in your tutorial)?

[X] I follow the instructions from the cell2location tutorial (using on scvi-tools).
[X] I have adjusted required hyperparameters to my dataset and tissue N_cells_per_location and detection_alpha.
[X] I have provided 10X reaction/inlet as batch_key for reference NB regression.
[x] I have checked scverse Discourse and old Cell2location Community Forum, and did not find a solution.

Description of the data input and hyperparameters

No issues running c2l, just a question relating to interpretation.

post mortem brain samples

Single cell reference data: number of cells, number of cell types, number of genes

snRNAseq (10x 3 prime, approx 16 cell types)

Single cell reference data: technology type (e.g. mix of 10X 3' and 5')

10X 3'

Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)

Visium

Hi @mgrantpeters

While transcriptional activity indeed affects how the method works, it would be a fair assumption that transcriptionally active cells lead to more RNA in both snRNA and Visium. In our experience, neurones always had more RNA than glial cells (both human and mouse) so your result is surprising. You are not looking at the proportion but at the absolute estimate of cell abundance. Normalising by total per spot doesn't change the ratios between cell types so does not address this in any way. You can always consider spatial distribution for every cell type independently - without comparing which cell types are more or less abundant. There could be technical issues affecting the mapping such as low quality of the Visium data, tissue attachment-induced artefacts, and insufficient granularity of cell annotations (we generally recommend going as granular as possible and 16 clusters sound very broad for postmortem brains).

BayraktarLab / cell2location