Idea of modifying model after assessing Reconstruction accuracy plot

parkjooyoung99 commented 1 year ago

Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.

Problem

I am using cell2location with the slide that we already know where cancer cells should be at. However, the result does not seems to detect the cancer tissue well. Re-examining my code and QC plot, I figured out that Reconstruction accuracy plot1 has different trend where I assume that my model have problem with inference. image (1)

Would there be any way to correct this kind of problem? I tried to get an idea with the tutorial but was hard to find.

[x] I follow the instructions from the cell2location tutorial (using on scvi-tools).
[x] I have adjusted required hyperparameters to my dataset and tissue N_cells_per_location and detection_alpha.
[ ] I have provided 10X reaction/inlet as batch_key for reference NB regression.
[x] I have checked scverse Discourse and old Cell2location Community Forum, and did not find a solution.

Description of the data input and hyperparameters

batch_key= sample training epoch = 500 (elbow started at near 200 when examining the elbow plot ) N_cells_per_location = 5 detection_alpha = 20

Ovarian cancer slide where we have prior knowledge of where the cancer cells should be at

Single cell reference data: number of cells, number of cell types, number of genes

number of cells = 37256 number of cell types = 43 number of genes = 14678

Single cell reference data: technology type (e.g. mix of 10X 3' and 5')

10X 5'

Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)

numver of locations = 2837 Visium

vitkl commented 1 year ago

You need to train the cell2location.models.Cell2location model as specified in the tutorial:

mod.train(max_epochs=30000,
          # train using full data (batch_size=None)
          batch_size=None,
          # use all data points in training because
          # we need to estimate cell abundance at all locations
          train_size=1,
          use_gpu=True,
         )

Note batch_size=None to use all data rather than minibatches and max_epochs=30000. It is important to train the model with these settings to achieve high accuracy. You can change max_epochs to other values in the range 10k - 100k depending on data - but 30k-50k works for most datasets we used.

parkjooyoung99 commented 1 year ago

Thank you for your reply! However unfortunately, even though I followed your instruction, still the plot seems to have different trend. What would be the reason for this issue?? Perhaps my reference data is not in good quality for inference??

Under is my whole code cell2location.models.RegressionModel.setup_anndata(adata=adata_ref,batch_key='sample', labels_key='SC04') from cell2location.models import RegressionModel mod = RegressionModel(adata_ref) mod.train(max_epochs=30000,batch_size=None, train_size=1,use_gpu=True,) adata_ref = mod.export_posterior(adata_ref, sample_kwargs={'num_samples': 1000, 'batch_size': 2500, 'use_gpu': True}) mod.plot_QC()

vitkl commented 1 year ago

I thought that you are referring to cell2location.models.Cell2location not RegressionModel - for the regression model, this plot looks acceptable (averages per cluster are similar - bottom plot). We saw this for some snRNA seq datasets.

RegressionModel you can actually just follow the default parameters https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Estimation-of-reference-cell-type-signatures-(NB-regression) - not as I suggested above.

I would suggest proceeding with spatial mapping - https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Cell2location:-spatial-mapping

parkjooyoung99 commented 1 year ago

Thank you so much for your help :)

BayraktarLab / cell2location