BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
313 stars 57 forks source link

Cell abundance vs proportion? [within-section technical effects lead to misleading cell abundance estimates] #198

Open livyring opened 2 years ago

livyring commented 2 years ago

Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.

Cell abundance vs cell proportion for calculating spatial maps

I am working with murine lung tissue bearing tumors. using the workflow provided and visualizing cell populations by abundance causes strange blank spots in the data where there are less cells in the area. Ie tumor border (more cells) vs tumor bed (far less diversity of cell types)

To account for this I was able to graph the spatial data by proportion rather than abundance by running this function:

def convert_density_to_prop(df): rowSum = df.sum(axis = 1) rowProportions = df.div(rowSum, axis = 0) return rowProportions

adata_vis.obs[adata_vis.uns['mod']['factor_names']] = convert_density_to_prop(adata_vis.obsm['q05_cell_abundance_w_sf'])

Which makes things look much better with no random holes in areas where there is a ground truth

However, I would like to apply this to the cell type specific expression data but I don't know how to do this with the "post_sample_q05" output. What is this output?? Looking at it it appears to be the number of cells recognized in a given spot. How can I manipulate this data into a proportion type thing such that it accurately demonstrates my gene expression per cell type

I have attached examples below of before and after using proportions for identifying cells

Description of the data input and hyperparameters

10X visium slide

Murine lung lobes with tumor

Single cell reference data: number of cells, number of cell types, number of genes

number of cells:~14000 Number of cell types: 23 Number of genes (in raw) = ~15500 ...

Single cell reference data: technology type (e.g. mix of 10X 3' and 5')

Two 3' single cell 10X datasets ...

Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)

~1900

... adata_vis.obs[adata_vis.uns['mod']['factor_names']] = adata_vis.obsm['q05_cell_abundance_w_sf']

image

adata_vis.obs[adata_vis.uns['mod']['factor_names']] = convert_density_to_prop(adata_vis.obsm['q05_cell_abundance_w_sf'])

image

Issue being that the total counts and gene counts are not distributed evenly throughout the tissue: Tumor border contains many cells and many diverse cell types:

image image
vitkl commented 2 years ago

Very interesting results. The plots you shared suggest that the cancer area is either 1) depleted of RNA compared to normal tissue (necrotic, less transcriptionally active?) or 2) Visium RNA detection is substantially reduced for the cancer tissue area (maybe it needs different permeabilization conditions, maybe something else like the presence of RNA-ases in cancer area). Is your sc/sn data for cancer cells also low in UMIs compared to normal cells? If not this probably points toward problem 2.

So I would say that given the Visium data with low UMI in that spot and your reference of cancer cells, cell2location’s best guess is that the amount of cancer cells in that spot is indeed lower than at the border. However, when you compute proportions you see that majority of signal is indeed from cancer cells. Per cell type RNA counts decomposition operate on observed UMI which are low in that spot. Hence the estimated expression of genes in cancer cells in that spot is also low.

Another possibility is reasonable UMI in that spot and the rest of the tissue but artifactually high total UMI at the border. Did you use detection_alpha=20 hyperparameter? Setting that hyperparameter to a lower value increases normalisation strength but we generally don’t recommend low values (1 or 5) because they increase background.

In general, https://discourse.scverse.org/ is a better place to discuss these questions - better visibility and more people can contribute.

livyring commented 2 years ago

Hi Vitalii!

Thank you for your insight! I think that the situation is the latter, wherein the Visium RNA detection is substantially lower compared to the other areas in the capture area. Total counts are quite low in these spaces on the visium (attached photo) compared to the single cell RNA seq reference set which appears quite normal.

Is there a way to get around this issue such that I can use proportion data to infer cell specific expression of genes, as opposed to relying on the UMI deconvolution method?

I used detection_alpha = 20, but I can try setting it lower in order to increase the normalization strength. However, I don’t think this will be a solid fix for my issue.

I will try to post on scverse as opposed to GitHub as well. Please reach out to me if you can think of a possible solution to my problem! My first committee meeting for my thesis project is approaching and I would love to be able to show them this data if I can get it to work 😊

vitkl commented 2 years ago

Total counts are quite low in these spaces on the visium (attached photo) compared to the single cell RNA seq reference set which appears quite normal.

With snRNA-seq you have selection bias for intact nuclei, which cancer cells in the middle of the blob might not have. Did you look at hypoxia markers?

Is there a way to get around this issue such that I can use proportion data to infer cell specific expression of genes, as opposed to relying on the UMI deconvolution method?

The problem is not using cell proportions but the fact that RNA is not detected. Given that cancer cells are dominant in the middle of the blob, most of the UMI there are going to be allocated to cancer cells. However, the amount of those UMI is low compared to the rest of the tissue. This is not necessarily a problem for the downstream analysis that you want to do. For downstream analysis, the decomposed UMI are going to be normalised so this difference is not necessarily a problem. I would try going ahead as is.

For TAC (aka thesis committee meeting), I think it's fine to explain the observations as they are.

livyring commented 2 years ago

I looked at the hypoxia markers in the spatial and single cell reference dataset (attached images of plots). As you can see, the reference shows an upregulation in expression of various hypoxic genes in the cancer cell cluster relative to the other cells. The spatial data seems to demonstrate an upregulation of hypoxic gene expression at the tumor border. I’m sure that the explanation for this observation is due to the low UMI in the center of the tumor compared to the periphery, like you mentioned.

As for plotting the cell type specific expression on the spatial map, would it be misleading to plot it not using proportion? I guess I am confused on whether or not it is normalized on these plots. From the cell specific expression plot of one of the highly differentially expressed genes in the cancer cluster “Lgals1” (from original email I sent you) it only shows expression on the periphery, where cancer cells appear to show up in the area that is identified for the cancer cell location without the proportion calculation. However, I am confident that the cancer cells in the center also have expression of Lgals1 in the center as well, so is plotting it this way misleading or incorrect? I just want to make sure that I am showing and interpreting the data in a legitimate way. Since it is not registering that there is a high presence of cancer cells in the center of the tumor area on the spatial plot without doing the proportion calculation, it is hard to evaluate if the normalization is taking into account the center of the tumor at all.

spatial_hypoxic_markers hypoxic_markers cell2location_cell_tye_sprecific_expression_issues

Thank you so so much for your time and explanations to all of my questions, it has been so helpful.

vitkl commented 2 years ago

@livyring github doesn't show email attachments so I recommend adding via github.com . You can edit your posts to add them.

livyring commented 2 years ago

@vitkl Sorry about that, I think I fixed it above