BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
303 stars 56 forks source link

Feeding per-location number of cells instead of global N_cells_per_location hyper-parameter #325

Open ssobt opened 10 months ago

ssobt commented 10 months ago

Hi! Thanks for designing this great tool. I've been using it quite a bit on Visium slides and it seems to have worked really well in the past. Currently I'm trying to use this on tissue slides that have large variances in cells per spot/location (as observed from DAPI staining adjacent slides). This is causing some weird cell type deconvolution results when running cell2location.models.Cell2location. In the paper it is suggested that "as an advanced feature, cell2location can use the per-location number of cells." I've been unsucessfully trying to locate this argument in the documentation for this function and thought this would be a good question for the community to have a reference for in case someone else has similar issues. Thanks for the help and for maintaining an active community here!

ssobt commented 10 months ago

realized this has been answered in the old Cell2location community forum, posting the relevant response here in case some one needs to see this:

While it is possible to provide segmentation based numbers to the old version (simply provide a numpy array (location * 1) instead of a single number) - we did not see any accuracy benefits from including that information in the mouse brain data. This feature is not available in the new pyro-based version for now.

ssobt commented 8 months ago

I tried to use the non-pyro version (v.02-alpha) and unfortunately when I put in a 1-dimensional numpy array with the nuclei counts of each spot (made from concatenating rows of 2d x,y matrix) on the Visium slide, the following error occurs asking for one value instead of locations specific values: Gamma has no finite default value to use, checked: ('median', 'mean', 'mode'). Pass testval argument or adjust so value is finite.

I also tried entering a 2d array with the nuclei counts with the same error. How does one enter location specific nuclei counts into this version? Any advice on this would be great, thanks!

Screen Shot 2023-12-10 at 10 25 51 PM

Here is how the model is setup:

nuclei_count_1d = np.array(nuclei_counts_1149G['Count'])
r = cell2location.run_cell2location(

      # Single cell reference signatures as pd.DataFrame
      # (could also be data as anndata object for estimating signatures
      #  as cluster average expression - `sc_data=adata_snrna_raw`)
      sc_data=inf_aver,
      # Spatial data as anndata object
      sp_data=slide,

      # the column in sc_data.obs that gives cluster idenitity of each cell
      summ_sc_data_args={'cluster_col': "annotation_1",
                        },

      train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
                  'n_iter': 40000, # Increase the number of iterations if needed (see QC below)

                  # Whe analysing the data that contains multiple experiments,
                  # cell2location automatically enters the mode which pools information across experiments
                  'sample_name_col': 'sample'}, # Column in sp_data.obs with experiment ID (see above)

      export_args={'path': results_folder, # path where to save results
                   'save_model': True,
                   'run_name_suffix': '' # optinal suffix to modify the name the run
                  },

      model_kwargs={ # Prior on the number of cells, cell types and co-located groups

                    'cell_number_prior': {
                        # - N - the expected number of cells per location:
                        'cells_per_spot': nuclei_count_1d, # < - change this
                        # - A - the expected number of cell types per location (use default):
                        'factors_per_spot': 7,
                        # - Y - the expected number of co-located cell type groups per location (use default):
                        'combs_per_spot': 7
                    },

                     # Prior beliefs on the sensitivity of spatial technology:
                    'gene_level_prior':{
                        # Prior on the mean
                        'mean': 1/2,
                        # Prior on standard deviation,
                        # a good choice of this value should be at least 2 times lower that the mean
                        'sd': 1/4
                    }
      }
)