BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
324 stars 58 forks source link

error in training regression model [models require unnormalised integer counts as input] #108

Closed deryu closed 2 years ago

deryu commented 2 years ago

Hi,

I am trying cell2location for my data. followed the tutorial and got an error at estimation of reference signature.

here is execution code. My computer has only cpu, I do not use gpu option.

# create and train the regression model
from cell2location.models import RegressionModel
mod = RegressionModel(scrna_raw)

# Use all data for training (validation not implemented yet, train_size=1)
mod.train(max_epochs=250, batch_size=2500, train_size=1, lr=0.002, use_gpu=False)

error msg is too long, so I paste final error msg.


ValueError: Expected parameter rate (Tensor of shape (5, 11163)) of distribution Gamma(concentration: torch.Size([5, 11163]), rate: torch.Size([5, 11163])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[ -990.3860,  -367.4243,  -503.9861,  ...,  -152.0115,   176.6569,
            83.2535],
        [ -779.4346,  -123.4887,  -820.1381,  ...,   220.1533,  -531.6564,
           587.0056],
        [ -779.4346,  -123.4887,  -820.1381,  ...,   220.1533,  -531.6564,
           587.0056],
        [ -741.7203,   160.4668,  -708.0273,  ...,  -152.0129,  -166.7664,
          -116.5899],
        [-1287.2499,  -676.4868, -1519.5457,  ...,  -169.2898,  -195.1531,
          -122.2516]])
                Trace Shapes:           
                 Param Sites:           
                Sample Sites:           
       per_cluster_mu_fg dist | 12 11163
                        value | 12 11163
      detection_mean_y_e dist |  5     1
                        value |  5     1
  s_g_gene_add_alpha_hyp dist |         
                        value |         
       s_g_gene_add_mean dist |  5     1
                        value |  5     1
s_g_gene_add_alpha_e_inv dist |  5     1
                        value |  5     1
            s_g_gene_add dist |  5 11163
                        value |  5 11163
         alpha_g_phi_hyp dist |         
                        value |         
         alpha_g_inverse dist |  1 11163
                        value |  1 11163

I know that the error message is because the parameter variable is negative. Could you please tell me how can I fix this? Looking forward to your reply.

Regards,

vitkl commented 2 years ago

Hi

How far into the training process did this error happen? If it is quite late you can stop training before this happens.

You can also try running the training again, the issue might resolve itself.

Do you have confounding between cell type and the batch?

Not related to your issue, but in general I would recommend using more detailed cell annotations (> 12 cell populations, but maybe there are tissues where this is representative).

On Wed, 12 Jan 2022, 08:42 deryu, @.***> wrote:

Hi,

I am trying cell2location for my data. followed the tutorial and got an error at estimation of reference signature.

here is execution code. My computer has only cpu, I do not use gpu option.

create and train the regression model

from cell2location.models import RegressionModel mod = RegressionModel(scrna_raw)

Use all data for training (validation not implemented yet, train_size=1)

mod.train(max_epochs=250, batch_size=2500, train_size=1, lr=0.002, use_gpu=False)

error msg is too long, so I paste final error msg.

ValueError: Expected parameter rate (Tensor of shape (5, 11163)) of distribution Gamma(concentration: torch.Size([5, 11163]), rate: torch.Size([5, 11163])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values: tensor([[ -990.3860, -367.4243, -503.9861, ..., -152.0115, 176.6569, 83.2535], [ -779.4346, -123.4887, -820.1381, ..., 220.1533, -531.6564, 587.0056], [ -779.4346, -123.4887, -820.1381, ..., 220.1533, -531.6564, 587.0056], [ -741.7203, 160.4668, -708.0273, ..., -152.0129, -166.7664, -116.5899], [-1287.2499, -676.4868, -1519.5457, ..., -169.2898, -195.1531, -122.2516]]) Trace Shapes: Param Sites: Sample Sites: per_cluster_mu_fg dist | 12 11163 value | 12 11163 detection_mean_y_e dist | 5 1 value | 5 1 s_g_gene_add_alpha_hyp dist | value | s_g_gene_add_mean dist | 5 1 value | 5 1 s_g_gene_add_alpha_e_inv dist | 5 1 value | 5 1 s_g_gene_add dist | 5 11163 value | 5 11163 alpha_g_phi_hyp dist | value | alpha_g_inverse dist | 1 11163 value | 1 11163

I know that the error message is because the parameter variable is negative. Could you please tell me how can I fix this? Looking forward to your reply.

Regards,

— Reply to this email directly, view it on GitHub https://github.com/BayraktarLab/cell2location/issues/108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMFTV6POQSFTQVEDH7T7RLUVU5G3ANCNFSM5LYLUMDA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

deryu commented 2 years ago

This process is killed immediately after execution. I used 12 cell populations and no confounding factor.

I can't solve the issue because I can't understand why this is happening. Please more detail recommadations to solve.

I executed 'cellpymc' environment and got same issue. following msg

`GPU available: False, used: False TPU available: False, using: 0 TPU cores Epoch 1/250: 0%| | 0/250 [00:00<?, ?it/s]

ValueError Traceback (most recent call last) ~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/poutine/trace_struct.py in compute_log_prob(self, site_filter) 230 log_p = site["fn"].log_prob( --> 231 site["value"], *site["args"], **site["kwargs"] 232 )

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/distributions/conjugate.py in log_prob(self, value) 276 if self._validate_args: --> 277 self._validate_sample(value) 278 post_value = self.concentration + value

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/torch/distributions/distribution.py in _validate_sample(self, value) 288 raise ValueError( --> 289 "Expected value argument " 290 f"({type(value).name} of shape {tuple(value.shape)}) "

ValueError: Expected value argument (Tensor of shape (2500, 11617)) to be within the support (IntegerGreaterThan(lower_bound=0)) of the distribution GammaPoisson(), but found invalid values: tensor([[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.7743], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]])

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last) /tmp/ipykernel_818/4161863545.py in 4 5 # Use all data for training (validation not implemented yet, train_size=1) ----> 6 mod.train(max_epochs=250, batch_size=2500, train_size=1, lr=0.002, use_gpu=False) 7 8 # plot ELBO loss history during training, removing first 20 epochs from the plot

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location/models/reference/_reference_model.py in train(self, max_epochs, batch_size, train_size, lr, kwargs) 156 kwargs["lr"] = lr 157 --> 158 super().train(kwargs) 159 160 def _compute_cluster_averages(self, key="_scvi_labels"):

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/scvi/model/base/_pyromixin.py in train(self, max_epochs, use_gpu, train_size, validation_size, batch_size, early_stopping, lr, plan_kwargs, trainer_kwargs) 143 trainer_kwargs, 144 ) --> 145 return runner() 146 147

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/scvi/train/_trainrunner.py in call(self) 70 self.training_plan.n_obs_training = self.data_splitter.n_train 71 ---> 72 self.trainer.fit(self.training_plan, self.data_splitter) 73 self._update_history() 74

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/scvi/train/_trainer.py in fit(self, *args, *kwargs) 175 message="LightningModule.configure_optimizers returned None", 176 ) --> 177 super().fit(args, **kwargs)

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule) 458 ) 459 --> 460 self._run(model) 461 462 assert self.state.stopped

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in _run(self, model) 756 757 # dispatch start_training or start_evaluating or start_predicting --> 758 self.dispatch() 759 760 # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in dispatch(self) 797 self.accelerator.start_predicting(self) 798 else: --> 799 self.accelerator.start_training(self) 800 801 def run_stage(self):

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer) 94 95 def start_training(self, trainer: 'pl.Trainer') -> None: ---> 96 self.training_type_plugin.start_training(trainer) 97 98 def start_evaluating(self, trainer: 'pl.Trainer') -> None:

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer) 142 def start_training(self, trainer: 'pl.Trainer') -> None: 143 # double dispatch to initiate the training loop --> 144 self._results = trainer.run_stage() 145 146 def start_evaluating(self, trainer: 'pl.Trainer') -> None:

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_stage(self) 807 if self.predicting: 808 return self.run_predict() --> 809 return self.run_train() 810 811 def _pre_training_routine(self):

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_train(self) 869 with self.profiler.profile("run_training_epoch"): 870 # run train epoch --> 871 self.train_loop.run_training_epoch() 872 873 if self.max_steps and self.max_steps <= self.global_step:

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self) 497 # ------------------------------------ 498 with self.trainer.profiler.profile("run_training_batch"): --> 499 batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx) 500 501 # when returning -1 from train_step, we end epoch early

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_batch(self, batch, batch_idx, dataloader_idx) 742 else: 743 self._curr_step_result = self.training_step( --> 744 split_batch, batch_idx, opt_idx, self.trainer.hiddens 745 ) 746

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in training_step(self, split_batch, batch_idx, opt_idx, hiddens) 288 model_ref._results = Result() 289 with self.trainer.profiler.profile("training_step"): --> 290 training_step_output = self.trainer.accelerator.training_step(args) 291 self.trainer.accelerator.post_training_step() 292

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in training_step(self, args) 202 203 with self.precision_plugin.train_step_context(), self.training_type_plugin.train_step_context(): --> 204 return self.training_type_plugin.training_step(*args) 205 206 def post_training_step(self) -> None:

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in training_step(self, *args, kwargs) 153 154 def training_step(self, *args, *kwargs): --> 155 return self.lightning_module.training_step(args, kwargs) 156 157 def post_training_step(self):

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/scvi/train/_trainingplans.py in training_step(self, batch, batch_idx) 708 kwargs.update({"kl_weight": self.kl_weight}) 709 # pytorch lightning requires a Tensor object for loss --> 710 loss = torch.Tensor([self.svi.step(*args, **kwargs)]) 711 712 return {"loss": loss}

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/infer/svi.py in step(self, *args, *kwargs) 143 # get loss and compute gradients 144 with poutine.trace(param_only=True) as param_capture: --> 145 loss = self.loss_and_grads(self.model, self.guide, args, **kwargs) 146 147 params = set(

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/infer/trace_elbo.py in loss_and_grads(self, model, guide, *args, **kwargs) 138 loss = 0.0 139 # grab a trace from the generator --> 140 for model_trace, guide_trace in self._get_traces(model, guide, args, kwargs): 141 loss_particle, surrogate_loss_particle = self._differentiable_loss_particle( 142 model_trace, guide_trace

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/infer/elbo.py in _get_traces(self, model, guide, args, kwargs) 180 else: 181 for i in range(self.num_particles): --> 182 yield self._get_trace(model, guide, args, kwargs)

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/infer/trace_elbo.py in _get_trace(self, model, guide, args, kwargs) 56 """ 57 model_trace, guide_trace = get_importance_trace( ---> 58 "flat", self.max_plate_nesting, model, guide, args, kwargs 59 ) 60 if is_validation_enabled():

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/infer/enum.py in get_importance_trace(graph_type, max_plate_nesting, model, guide, args, kwargs, detach) 73 model_trace = prune_subsample_sites(model_trace) 74 ---> 75 model_trace.compute_log_prob() 76 guide_trace.compute_score_parts() 77 if is_validation_enabled():

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/poutine/trace_struct.py in compute_log_prob(self, site_filter) 238 name, exc_value, shapes 239 ) --> 240 ).with_traceback(traceback) from e 241 site["unscaled_log_prob"] = log_p 242 log_p = scale_and_mask(log_p, site["scale"], site["mask"])

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/poutine/trace_struct.py in compute_log_prob(self, site_filter) 229 try: 230 log_p = site["fn"].log_prob( --> 231 site["value"], *site["args"], **site["kwargs"] 232 ) 233 except ValueError as e:

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pyro/distributions/conjugate.py in log_prob(self, value) 275 def log_prob(self, value): 276 if self._validate_args: --> 277 self._validate_sample(value) 278 post_value = self.concentration + value 279 return (

~/miniconda3/envs/cellpymc/lib/python3.7/site-packages/torch/distributions/distribution.py in _validate_sample(self, value) 287 if not valid.all(): 288 raise ValueError( --> 289 "Expected value argument " 290 f"({type(value).name} of shape {tuple(value.shape)}) " 291 f"to be within the support ({repr(support)}) "

ValueError: Error while computing log_prob at site 'data_target': Expected value argument (Tensor of shape (2500, 11617)) to be within the support (IntegerGreaterThan(lower_bound=0)) of the distribution GammaPoisson(), but found invalid values: tensor([[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.7743], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]]) Trace Shapes:
Param Sites:
Sample Sites:
per_cluster_mu_fg dist | 12 11617 value | 12 11617 log_prob |
detection_mean_y_e dist | 5 1 value | 5 1 log_prob |
s_g_gene_add_alpha_hyp dist |
value |
log_prob |
s_g_gene_add_mean dist | 5 1 value | 5 1 log_prob |
s_g_gene_add_alpha_e_inv dist | 5 1 value | 5 1 log_prob |
s_g_gene_add dist | 5 11617 value | 5 11617 log_prob |
alpha_g_phi_hyp dist |
value |
log_prob |
alpha_g_inverse dist | 1 11617 value | 1 11617 log_prob |
data_target dist 2500 11617 |
value 2500 11617 | `

vitkl commented 2 years ago

This error is different. The model expects unnormalized and untransformed integer counts as input - but the data you provided has fractional values [and probably negative values]. Would be good if you change that to see if the issue persists before looking into other reasons.

RE confounding. Are all of your 12 cell types present in all 5 batches? Presence of batches that correspond 1:1 to cell types, which can be the case when FACS sorting cell types in data generation, can create an issue for this model.

deryu commented 2 years ago

As per your answer, I'm running it by converting it to an integer type.

I would like to ask for one more your advice. After doing BBKNN, the matrix(refmatrix.raw.X) contains negative and fractions, are there any considerations when converting to integers? My data confirmed that the minimum after BBKNN is -10.304938.

Thank you again.

vitkl commented 2 years ago

The NB regression model requires raw untransformed and unnormalised counts which is stored in the CellRanger output count matrices.

Common workflows normalise the counts by total RNA count per cell (first step), apply log-transformation (log1p) as a second step, perform scaling as a third step. Scaling leads to negative values. Scaled matrices are used to construct PCA loadings (4th step) and PCA loadings are used by tools such as BBKNN to construct the batch-corrected graph. BBKNN does not alter counts in any way. That said, you need to provide cell2location NB regression with the counts you had at the beginning of this process - not by converting normalised counts into integers. The NB regression model performs normalisation and estimates the average expression of each gene in each cell type on a linear scale as required by the main cell2location model.

deryu commented 2 years ago

First, I converted it to an h5ad file using the conversion funcion provided by Seurat. In this process, it was confirmed that the sclaed value was stored, not the raw integer. By adjusting this, I was able to do the following steps properly. Thank you for your kind explanation.

wangjiawen2013 commented 2 years ago

Cell2location used bbknn to remove batch effects in it's NatureMethod paper. And It's also said that "When there are strong technical effects between multiple batches (not the case here) sc.external.pp.bbknn can be in principle used to account for those effects during the KNN construction" in the tutorial. As we know, scvi-tools can also be used to batch effects. Can we use scvi-tools to remove batch effects instead of bbknn ?