BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
321 stars 58 forks source link

Which quantile to use for the cell type mapping? #278

Open ankitbioinfo opened 1 year ago

ankitbioinfo commented 1 year ago

Hi,

I used cell2location to find the cell type annotation in the MERFISH data using correspoding scRNA-seq data. I used the N_cells_per_location=1, and detection_alpha=20 parameter for the training. I see four entries in the posterior distribution in the sp.h5ad file. These entries are means_cell_abundance_w_sf, q05_cell_abundance_w_sf, q95_cell_abundance_w_sf, stds_cell_abundance_w_sf.

If I understood right thenI will get the cell type name from the maximum argument in each cell entries from the posterior distribution. But I am bit confused here which quantile or means value to use to know the cell type. I felt in some dataset 'q05_cell_abundance_w_sf' gives the reasonable annotation and some other dataset q95_cell_abundance_w_sf. Could you give an explanation?

Thank you.

vitkl commented 1 year ago

I would recommend using 0.05 or 0.5 quantiles (aka low quantiles that reflect confidence of the model in the abundance and the median). You can also use several quantiles to show posterior distribution range.

ankitbioinfo commented 1 year ago

Thanks Vitalii for your description. I have also encounter another problem when I run the estimated cell abundance step then jupyter kernel dies. It happend almost 3 times. Do you have any recommendation? Shall I decrease the num_sample size? Because I have close to around ~400,000 spatial cells.

# In this section, we export the estimated cell abundance (summary of the posterior distribution).
adata_vis = mod.export_posterior(
    adata_vis, sample_kwargs={'num_samples': 1000, 'batch_size': mod.adata.n_obs, 'use_gpu': False}
)

# Save model
mod.save(f"{run_name}", overwrite=True)

# mod = cell2location.models.Cell2location.load(f"{run_name}", adata_vis)

# Save anndata object with results
adata_file = f"{run_name}/sp.h5ad"
adata_vis.write(adata_file)
adata_file
vitkl commented 1 year ago

Very nice data. For this cell number its probably a better idea to compute quantiles directly use_quantiles=True rather than using samples from posterior distribution - 'num_samples': 1000 means creating n_cell_types 400k 1000 dense matrix which probably doesn't fit in your RAM.

# In this section, we export the estimated cell abundance (summary of the posterior distribution).
adata_vis = mod.export_posterior(
    adata_vis, use_quantiles=True, sample_kwargs={'num_samples': 1000, 'batch_size': mod.adata.n_obs, 'use_gpu': False}
)