BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
324 stars 58 forks source link

Interpretation of the deconvolution results #151

Closed Zhuang-Bio closed 2 years ago

Zhuang-Bio commented 2 years ago

Interpretation of the deconvolution results

Description of the data input and hyperparameters

Single cell reference data: number of cells, number of cell types, number of genes

~20000 cell, 26 clusters, ~5000 marker genes ...

Single cell reference data: technology type (e.g. mix of 10X 3' and 5')

10X3' v3 ...

Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)

Visium

Question

Hej, Thank you very much for this awesome tool. I just followed the pipeline and everything went very well. In terms of interpreting the results, I know it is better to use the 'q05_cell_abundance_w_sf' as deconvolution results to project into the H&E images. But what does the values of confident cell abundance of each cell type means? I summed up those values in each row, but the total is quite different which makes me a little bit confused. More specifically, I have 26 clusters as below, I wonder if it is reasonable that I can calculate the relative cell type proportion for each spot (e.g. relative proportion of Basal-I in spot 1 = 0.0788311/5.16249044)? That may be easy for us to understand if the total amount of each spot equals to 1, we could compare the results of different cell types directly.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

cell2location results Spatial_SpotID | Basal-I | Basal-II | Basal-III | DC | DC-LAMP | Dermal-DC | FB-I | FB-II | FB-III | FB-IV | Granular-I | Granular-II | LC | LE | MEL | Mast-cell | Mono-Mac | NK-cell | PC-vSMC | Plasma-B-cell | Schwann | Spinous-I | Spinous-II | Spinous-III | Th | VE | Total -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- AAACAAGTATCTCCCA-1 | 0.0788311 | 0.01183488 | 0.0013773 | 0.0066458 | 0.00330211 | 0.00185754 | 0.82047006 | 0.06403197 | 3.31677738 | 0.00127522 | 0.00188847 | 0.02291755 | 0.10469464 | 0.12440987 | 0.02615497 | 0.05246331 | 0.04863389 | 0.01683048 | 0.03649302 | 0.00685662 | 0.00687596 | 0.2401821 | 0.10762781 | 0.0053093 | 0.03440755 | 0.02034153 | 5.16249044 AAACATTTCCCGGATT-1 | 0.00802813 | 0.00075149 | 0.0022309 | 0.00468953 | 0.00162133 | 0.00118663 | 0.31470423 | 0.02221218 | 1.92233016 | 0.00103692 | 0.00336535 | 0.00105805 | 0.0251716 | 0.01604621 | 0.02824159 | 0.06488496 | 0.02648281 | 0.00788521 | 0.07540033 | 0.00343071 | 0.02093353 | 0.00931369 | 0.00423436 | 0.00128129 | 0.01348818 | 0.00388846 | 2.58389782 AAACCTAAGCAGCCGG-1 | 0.11836652 | 0.01075403 | 0.02144423 | 0.10995266 | 0.09720989 | 0.05960494 | 0.00045867 | 0.43426276 | 0.01384732 | 0.02475756 | 0.00795721 | 0.00277988 | 0.31790655 | 0.21364005 | 1.84851403 | 0.09111738 | 0.12857315 | 0.14189067 | 1.42056299 | 0.1412995 | 0.54924091 | 0.13600109 | 0.12423333 | 5.46599512 | 0.66547016 | 0.68631223 | 12.8321528 AAACGAGACGGTTGAT-1 | 0.04039945 | 0.00090433 | 0.00193628 | 0.00361877 | 0.00120299 | 0.00109853 | 0.18939618 | 0.03465241 | 3.35178006 | 0.00105741 | 0.00070539 | 0.05037835 | 0.04513825 | 0.00694101 | 0.02586368 | 0.0677819 | 0.02174506 | 0.01301383 | 0.00370227 | 0.00310089 | 0.00597889 | 0.06166198 | 0.01398668 | 0.0010241 | 0.01857916 | 0.00215477 | 3.96780262

That's something like using the Stereoscope for the deconvolution. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Stereoscope results Spot_ID | B.cell | Basal.I | Basal.II | Basal.III | DC | FB.I | FB.II | FB.III | FB.IV | Granular.I | Granular.II | LC | LE | Mac | Mast.cell | MEL | Mono.DC | Mono.Mac | NK.cell | PC.vSMC | Schwann | Spinous.I | Spinous.II | Spinous.III | Th | VE | Total -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- AAACAAGTATCTCCCA-1 | 0.000000752 | 0.000000838 | 0.000001113 | 0.000001819 | 0.000000642 | 0.510354700 | 0.000000610 | 0.237905950 | 0.000002362 | 0.062628990 | 0.007137010 | 0.000000480 | 0.017056220 | 0.059652492 | 0.000000186 | 0.105165580 | 0.000081452 | 0.000001618 | 0.000000542 | 0.000000234 | 0.000001415 | 0.000000557 | 0.000000925 | 0.000001599 | 0.000000716 | 0.000001369 | 1.00 AAACATTTCCCGGATT-1 | 0.000000579 | 0.010616324 | 0.000000520 | 0.000002530 | 0.023446321 | 0.352238830 | 0.000001269 | 0.426972480 | 0.000000769 | 0.000000404 | 0.000000312 | 0.000000608 | 0.000000417 | 0.000000848 | 0.011878287 | 0.134177450 | 0.000000730 | 0.000001426 | 0.000000382 | 0.040651760 | 0.000001584 | 0.000000236 | 0.000000308 | 0.000000356 | 0.000000377 | 0.000004968 | 1.00 AAACCTAAGCAGCCGG-1 | 0.000000463 | 0.000000383 | 0.000000227 | 0.000001531 | 0.000000135 | 0.000000080 | 0.044569973 | 0.000000239 | 0.000000166 | 0.000000201 | 0.000000698 | 0.000000181 | 0.000000206 | 0.000000749 | 0.006253038 | 0.398368750 | 0.000000274 | 0.000000175 | 0.000005720 | 0.083124240 | 0.000021626 | 0.000000115 | 0.000000109 | 0.000000290 | 0.000004656 | 0.467645820 | 1.00

Thank you in advance!

Cheers, Zhuang

yxiaobme commented 2 years ago

I have the same questions! Can the abundance values of each cell type be compared directly ? What exactly do those values represent?

vitkl commented 2 years ago

Hi @Zhuang-Bio @yxiaobme

'q05_cell_abundance_w_sf' deconvolution result can be interpreted as cell abundance or cell density per location: expected number of cells, including fractional values for incompletely captured cells.

In contrast to "cell proportions" per location, using cell abundance doesn't assume that the total number of cells is identical across locations. Normalising to the total per spot doesn't simplify comparisons of cell abundance between cell types in any way. A caveat is that both of these measures are approximate and the spatial distribution of a cell type is more robust than both cell abundance or cell proportion differences between cell types. We generally see that highly abundant cell types indeed have higher 'q05_cell_abundance_w_sf', for example Naive CD4 T cells in tutorial more abundant than FDC cells, however, any differences in abundance between cell types need to be validated with other technogies.

Zhuang-Bio commented 2 years ago

@vitkl Thanks for your clear explanation.

pumpkin-hat commented 1 year ago

@vitkl Hi Thanks for your explanation. Is there any cutoff for 'q05_cell_abundance_w_sf' to fitlter the table to keep only high-quality result? I got the annotation but many cell's abundance is very low(less than 0.04).Looking forward to your reply.