BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
321 stars 58 forks source link

Idea of modifying model after assessing Reconstruction accuracy plot #285

Closed parkjooyoung99 closed 1 year ago

parkjooyoung99 commented 1 year ago

Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.

Problem

I am using cell2location with the slide that we already know where cancer cells should be at. However, the result does not seems to detect the cancer tissue well. Re-examining my code and QC plot, I figured out that Reconstruction accuracy plot1 has different trend where I assume that my model have problem with inference. image (1)

Would there be any way to correct this kind of problem? I tried to get an idea with the tutorial but was hard to find.

Description of the data input and hyperparameters

batch_key= sample training epoch = 500 (elbow started at near 200 when examining the elbow plot ) N_cells_per_location = 5 detection_alpha = 20

Ovarian cancer slide where we have prior knowledge of where the cancer cells should be at

Single cell reference data: number of cells, number of cell types, number of genes

number of cells = 37256 number of cell types = 43 number of genes = 14678

Single cell reference data: technology type (e.g. mix of 10X 3' and 5')

10X 5'

Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)

numver of locations = 2837 Visium

vitkl commented 1 year ago

You need to train the cell2location.models.Cell2location model as specified in the tutorial:

mod.train(max_epochs=30000,
          # train using full data (batch_size=None)
          batch_size=None,
          # use all data points in training because
          # we need to estimate cell abundance at all locations
          train_size=1,
          use_gpu=True,
         )

Note batch_size=None to use all data rather than minibatches and max_epochs=30000. It is important to train the model with these settings to achieve high accuracy. You can change max_epochs to other values in the range 10k - 100k depending on data - but 30k-50k works for most datasets we used.

parkjooyoung99 commented 1 year ago

Thank you for your reply! However unfortunately, even though I followed your instruction, still the plot seems to have different trend. What would be the reason for this issue?? Perhaps my reference data is not in good quality for inference??

image

Under is my whole code cell2location.models.RegressionModel.setup_anndata(adata=adata_ref,batch_key='sample', labels_key='SC04') from cell2location.models import RegressionModel mod = RegressionModel(adata_ref) mod.train(max_epochs=30000,batch_size=None, train_size=1,use_gpu=True,) adata_ref = mod.export_posterior(adata_ref, sample_kwargs={'num_samples': 1000, 'batch_size': 2500, 'use_gpu': True}) mod.plot_QC()

vitkl commented 1 year ago

I thought that you are referring to cell2location.models.Cell2location not RegressionModel - for the regression model, this plot looks acceptable (averages per cluster are similar - bottom plot). We saw this for some snRNA seq datasets.

RegressionModel you can actually just follow the default parameters https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Estimation-of-reference-cell-type-signatures-(NB-regression) - not as I suggested above.

I would suggest proceeding with spatial mapping - https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Cell2location:-spatial-mapping

parkjooyoung99 commented 1 year ago

Thank you so much for your help :)