Reconstruct single cell gene expression matrix

BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)

https://cell2location.readthedocs.io/en/latest/

Apache License 2.0

320 stars 58 forks source link

Reconstruct single cell gene expression matrix #211

Open ggruenhagen3 opened 1 year ago

ggruenhagen3 commented 1 year ago

Hi,

Thank you for building this incredible tool! I especially like that cell2location estimates the number of cells from each cell type in a spot, not just the proportion. I think it would be really cool if the tool could also reconstruct a single cell expression matrix. In other words, if spot1 is predicted to have two cells from celltype A and one cell from celltype B, then the reconstructed single cell matrix for spot1 would have three cells with the gene expression profile constructed from spot1 and split according to cell type. This sort of thing is implemented by SpaTalk and I can kind of use the cell2location results in SpaTalk. But I think directly using the models built by cell2location would be more accurate than SpaTalk having to re-estimate things (or whatever method it uses for single cell matrix reconstruction).

Thanks, George

vitkl commented 1 year ago

Hi @ggruenhagen3

Great to hear that cell2location helps.

Cell2location already offers similar functionality: https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Estimate-cell-type-specific-expression-of-every-gene-in-the-spatial-data-(needed-for-NCEM)

I think it's impossible to say that a location has a discrete number of cells because the data comes from a 2D section - even if you see 2 nuclei it doesn't mean that RNA was isolated from 2 complete cells. This is reflected in cell2location estimates being a continuum of cell abundance/cell density. However, you can estimate the expected number of RNA counts coming from each cell type at every location for every gene (computed in the tutorial above). You can filter {cell_type location gene data} to exclude {cell_type * location} pairs where cell abundance is too small and RNA count is too small and then normalise expected counts by cell abundance (done in NCEM workflow).

I hope this makes sense.

ggruenhagen3 commented 1 year ago

@vitkl that makes sense, I'll give that link you sent a try. Thanks!

vravik commented 1 year ago

Hello,

I have a quick question regarding the extracted cell-type specific expression. Are these expected number of raw UMIs per cell-type in that spot? Would I have to normalize them for spot-specific library-size?

vitkl commented 1 year ago

Are these expected number of raw UMIs per cell-type in that spot?

Yes but the way this is currently computed the numbers are not integers.

Would I have to normalize them for spot-specific library-size?

Yes, you can try using adata.uns['mod']['post_sample_means']['detection_y_s'] per spot technical RNA detection sensitivity estimate.