Estimated cell type counts: adjusted for abundance or not?

BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)

https://cell2location.readthedocs.io/en/latest/

Apache License 2.0

306 stars 56 forks source link

Estimated cell type counts: adjusted for abundance or not? #377

Open BaharehAjami opened 1 month ago

BaharehAjami commented 1 month ago

Hello Cell2location team,

When estimating celltype-specific counts per spot, are these counts normalized for the number of cells in each spot? I am interested in quantifying the expression of cells in each spot as if only one cell of each type (if available) were in said spot. This would make it easier to compare cell type expression between conditions if, say, the abundance of a given cell type were higher in one condition compared to the other, for example. If the counts are unnormalized for abundance, it's clear how this could be a confounding factor. Any advice would be appreciated, and thank you for the great tool.

-Ajami lab team

vitkl commented 3 weeks ago

Hi @BaharehAjami

You can look at the code. The best choice here is not decided. You don't want to normalise by cell abundance if you don't filter locations to exclude locations with near 0 abundance of a given cell type - normalisation by cell abundance only makes sense after filtering.

BaharehAjami commented 3 weeks ago

Hi Viktl,

Thank you for your respone, I had indeed thought about removing low-abundance cells, or filtering spots based on a gene marker for the cell type of interest. A question, when you say the best choice here is not decided, has this topic come up before/are the devs considering how best to address this? It seems like it would be an important consideration for certain downstream analyses, but perhaps it is too niche.

Also, when it does come to normalizing for abundance, which strategy do you think is best? Simply dividing the expression of each gene from the celltype-specific counts by the abundance values of the given celltype in each spot seems like the most straightforward approach, but perhaps there is a better method?

-Ajami lab team

vitkl commented 3 weeks ago

I mean that the decision depends on what kind of downstream analysis you do. We also considered several ways to compute these counts. Ideally the procedure for making cell type specific counts would preserve counts rather than simply multiplying observed counts by a fraction that represents what fraction of counts for this gene and this location likely come from this cell type (which results in fractional output rather than counts).

For higher resolution techniques where you segment cells this could be a way to subtract background from neighbouring cell types - creating cleaner profiles for the main cell type.

It’s also not clear which use cases people actually need - it’s always better to generate and compare sc references when comparing several conditions.

BaharehAjami commented 3 weeks ago

Hi Vitkl,

So, if one were intending to compare celltype specific gene expression across conditions, and one also wanted to normalize for abundance because abundance changes drastically across said conditions, which normalization would you suggest?

Currently, I am considering:

remove all spots with near-zero abundance for a given cell type
in each spot, divide celltype specific counts for each gene by abundances of the given cell type
proceed to comparing across conditions

Does this seem sound to you? I am especially interested in what you would do in step 2. And thank you very much for the advice.

-Ajami lab team