BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
310 stars 58 forks source link

Expected parameter rate Value Error in tuning cell2location #320

Open prakashraaz opened 1 year ago

prakashraaz commented 1 year ago

Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.

Problem

...

I'm trying to run cell2location as shown in this tutorial as in cell 19 ( steps unto cell 18 have run successfully). https://github.com/theislab/spatial_scog_workshop_2022/blob/main/cell2location/cell2location_tutorial.ipynb

This is the command I'm using mod.train(max_epochs=30000,

train using full data (batch_size=None)

      batch_size=None, 
      # use all data points in training because 
      # we need to estimate cell abundance at all locations
      train_size=1,
      use_gpu=True)

As you can see I have requested the GPU as necessary GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

I'm getting the following error

ValueError: Expected parameter rate (Tensor of shape (1, 1)) of distribution Gamma(concentration: tensor([[10.]], device='cuda:0'), rate: tensor([[nan]], device='cuda:0')) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values: tensor([[nan]], device='cuda:0') Trace Shapes:
Param Sites:
Sample Sites:
m_g_mean dist | 1 1 value | 1 1 m_g_alpha_e_inv dist | 1 1 value | 1 1 m_g dist | 1 0 value | 1 0 n_s_cells_per_location dist 2600 1 |
value 2600 1 |
b_s_groups_per_location dist 2600 1 |
value 2600 1 |
z_sr_groups_factors dist 2600 50 |
value 2600 50 |
k_r_factors_per_groups dist | 50 1 value | 50 1 x_fr_group2fact dist | 50 24 value | 50 24 w_sf dist 2600 24 |
value 2600 24 |

How do I resolve this error?

Description of the data input and hyperparameters

...

View of AnnData object with n_obs × n_vars = 2600 × 36601 obs: 'in_tissue', 'array_row', 'array_col', 'sample_type', 'sample_id', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_genes' var: 'feature_types', 'genome', 'SYMBOL', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts' uns: 'spatial' obsm: 'spatial'

Single cell reference data: number of cells, number of cell types, number of genes

AnnData object with n_obs × n_vars = 50416 × 14159 obs: 'n_genes', 'patient', 'sample', 'environment', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rb', 'pct_counts_rb', 'sex', 'age', 'sorting', 'smoking history', 'cancer stage', 'tumour type', '# isolated cells', '# estimated cells', 'sangerID', 'cellranger', 'genome', 'batch', 'patient_sample', 'exp', 'n_counts', 'doublet_scores', 'S_score', 'G2M_score', 'phase', 'PHASE', 'leiden', 'leiden_new_cmcells_v4', 'leiden_new_bcells_v4', 'leiden_new_mastcells_v3', 'leiden_new_myeloid_v6', 'leiden_new_nkcells_v2', 'leiden_new_stroma_v8', 'leiden_new_tcells_v4', 'leiden_new_broad_v1', 'Cell types', 'Cell types v2', 'Cell types v3', 'Cell types v4', 'Cell types v5', 'Cell types v6', 'Cell types v7', 'Cell types v8', 'Cell types v9', 'Cell types v10', 'Cell types v11', 'Cell types v12', 'Cell types v13', 'Cell types v14', 'Cell types v15', 'Cell types v16', 'Cell types v17', 'Cell types v18', 'Cell types v19', 'Cell types v20', 'Cell types v21', 'Cell types v22', 'Cell types v23', 'Cell types v24', 'Cell types v25', '_indices', '_scvi_batch', '_scvi_labels' var: 'feature_types', 'n_cells', 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection', 'nonz_mean' uns: '_scvi_uuid', '_scvi_manager_uuid', 'mod' obsm: 'X_pca', 'X_pca_harmonize', 'X_umap' varm: 'means_per_cluster_mu_fg', 'stds_per_cluster_mu_fg', 'q05_per_cluster_mu_fg', 'q95_per_cluster_mu_fg', 'q50_per_cluster_mu_fg', 'q0001_per_cluster_mu_fg'

...

Single cell reference data: technology type (e.g. mix of 10X 3' and 5')

10X

Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)

Visium

vitkl commented 1 year ago

This error most likely relates to the procedure we use to guess the expected technology sensitivity difference level. Could you check that you subset both reference data frame and spatial anndata objects to the same set of genes and that both reference data frame and anndata.X or anndata.layers[whatever layer you use] don't contain NaN?