BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
320 stars 58 forks source link

Idependent executions with same params provide slightlhy different results - Problem with results reproducibility #338

Open AlbertPlaPlanas opened 11 months ago

AlbertPlaPlanas commented 11 months ago

First of all, thanks and congratulations on this great piece of software.

I have noticed that when reproducing some cell2location past analysis in my pipeline I obtain slighlty different results in the cell proportions (which may lead to different downstream analysis results). To determine if the issue was on my end or if it was coming from the original cell2location package I have run the demo notebook ( https://github.com/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb ) 2 times using a clean kernel and I have obtained slightly different cell proportions:

Proportions of execution 1:

image

Proportiosn of execution2:

image

Minimal code sample (that we can run without your data, using public data)

The only changes I made to the tutorial notebook (to speed up execution) have been:

Sample of code to show the output

import scanpy as sc

adata_vis = sc.read('./results/lymph_nodes_analysis_run1/cell2location_map/sp.h5ad')
print(f"representing confident cell abundance")
adata_vis.obs[adata_vis.uns["mod"]["factor_names"]] = adata_vis.obsm[
    "q05_cell_abundance_w_sf"
]

cell_type = adata_vis.obs[adata_vis.uns["mod"]["factor_names"]]
cell_type

Hypothesis

When executing the sample code I get the following warning in cell 12:

image

It seems like scvi seed is not initailized in Cell2Location. Could this be the cause of this behaviour?

AlbertPlaPlanas commented 11 months ago

setting the scvi global seed after importing cell2location seems to address the issue

import cell2location
import scvi
scvi.settings.seed = 2023
vitkl commented 11 months ago

When cell2location package was released scvi-tools package used to set seed by default, however, this changed recently. It is expected to observe slightly different results under different seed conditions, including confusion between cell populations with insufficient detail in their gene expression signatures to resolve their spatial location. You can set the seed as a warning suggested to get improved numerical reproducibility. Complete numerical reproducibility is not expected because Stochastic Variational Inference is an inference method that includes random sampling in each training step.

I will add a note about this to the notebook.