issue when running jupyter visum-2022-handson-generative-models in GPU

Describe the bug When trying to calculate get_toxicty based on the jupyter notebook demo visum-2022-handson-generative-models.ipynb, when setting disable_gpu =False in first code snippet, I get the error of some inner data structure not transferred to GPU device inside the paccmann module code, trying to move the molecules or paccmann model on the jupyter notebook code was not possible, so I'm guessing it is a bug.
To Reproduce Steps to reproduce the behaviour:
Go to https://github.com/GT4SD/gt4sd-core/blob/ea9c299cea6ea6c63696b4e9ee49ceb16fb58507/notebooks/visum-2022-handson-generative-models.ipynb
Click on first code snippet and change disable_gpu =False
Run all code snippets until the on Evaluating the generated molecules
See error
Expected behavior Run does not crashes and toxicity is calculated on gpu device
Screenshots
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/gt4sd/a │
│ lgorithms/core.py:353 in predict                                                                 │
│                                                                                                  │
│   350 │   │   """                                                                                │
│   351 │   │                                                                                      │
│   352 │   │   try:                                                                               │
│ ❱ 353 │   │   │   predicted = self.predictor(input)                                              │
│   354 │   │   except TimeoutError:                                                               │
│   355 │   │   │   detail = (                                                                     │
│   356 │   │   │   │   f"Predicting took longer than maximum ({self.max_runtime} seconds)."       │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/gt4sd/p │
│ roperties/molecules/core.py:815 in informative_model                                             │
│                                                                                                  │
│   812 │   │   # Wrapper to get toxicity-endpoint-level predictions                               │
│   813 │   │   def informative_model(x: SmallMolecule) -> List[PropertyValue]:                    │
│   814 │   │   │   x = to_smiles(x)                                                               │
│ ❱ 815 │   │   │   _ = model(x)                                                                   │
│   816 │   │   │   return model.predictions.detach().tolist()                                     │
│   817 │   │                                                                                      │
│   818 │   │   return informative_model                                                           │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/paccman │
│ n_generator/drug_evaluators/tox21.py:56 in __call__                                              │
│                                                                                                  │
│   53 │   │   │   raise TypeError(f'Input must be String, not :{type(smiles)}')                   │
│   54 │   │                                                                                       │
│   55 │   │   smiles_tensor = self.preprocess_smiles(smiles)                                      │
│ ❱ 56 │   │   return self.tox21_score(smiles_tensor)                                              │
│   57 │                                                                                           │
│   58 │   def tox21_score(self, smiles_tensor: torch.Tensor) -> float:                            │
│   59 │   │   """                                                                                 │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/paccman │
│ n_generator/drug_evaluators/tox21.py:70 in tox21_score                                           │
│                                                                                                  │
│   67 │   │   """                                                                                 │
│   68 │   │                                                                                       │
│   69 │   │   # Test the compound                                                                 │
│ ❱ 70 │   │   predictions, _ = self.model(smiles_tensor)                                          │
│   71 │   │   # To allow accessing the raw predictions from outside                               │
│   72 │   │   self.predictions = predictions[0, :]                                                │
│   73                                                                                             │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/torch/n │
│ n/modules/module.py:1130 in _call_impl                                                           │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/toxsmi/ │
│ models/mca.py:268 in forward                                                                     │
│                                                                                                  │
│   265 │   │   │   prediction_dict includes the prediction and attention weights.                 │
│   266 │   │   """                                                                                │
│   267 │   │                                                                                      │
│ ❱ 268 │   │   embedded_smiles = self.smiles_embedding(smiles.to(dtype=torch.int64))              │
│   269 │   │                                                                                      │
│   270 │   │   # SMILES Convolutions. Unsqueeze has shape batch_size x 1 x T x H.                 │
│   271 │   │   encoded_smiles = [embedded_smiles] + [                                             │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/torch/n │
│ n/modules/module.py:1130 in _call_impl                                                           │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/torch/n │
│ n/modules/sparse.py:158 in forward                                                               │
│                                                                                                  │
│   155 │   │   │   │   self.weight[self.padding_idx].fill_(0)                                     │
│   156 │                                                                                          │
│   157 │   def forward(self, input: Tensor) -> Tensor:                                            │
│ ❱ 158 │   │   return F.embedding(                                                                │
│   159 │   │   │   input, self.weight, self.padding_idx, self.max_norm,                           │
│   160 │   │   │   self.norm_type, self.scale_grad_by_freq, self.sparse)                          │
│   161                                                                                            │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/torch/n │
│ n/functional.py:2199 in embedding                                                                │
│                                                                                                  │
│   2196 │   │   #   torch.embedding_renorm_                                                       │
│   2197 │   │   # remove once script supports set_grad_enabled                                    │
│   2198 │   │   _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)                    │
│ ❱ 2199 │   return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)        │
│   2200                                                                                           │
│   2201                                                                                           │
│   2202 def embedding_bag(                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when 
checking argument for argument index in method wrapper__index_select)

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:16                                                                                   │
│                                                                                                  │
│   13 │   solubility.append(get_solubility(mol))                                                  │
│   14 │   synthesizability.append(get_synthesizability(mol))                                      │
│   15 │   molecular_weight.append(get_molecular_weight(mol))                                      │
│ ❱ 16 │   toxicity.append(get_toxicity(molecule))                                                 │
│   17                                                                                             │
│   18 affinity = get_affinity(molecules)                                                          │
│   19                                                                                             │
│                                                                                                  │
│ in <lambda>:11                                                                                   │
│                                                                                                  │
│    8 get_synthesizability = PropertyPredictorRegistry.get_property_predictor('scscore')          │
│    9 get_molecular_weight = PropertyPredictorRegistry.get_property_predictor('molecular_weigh    │
│   10 toxicity_fn = PropertyPredictorRegistry.get_property_predictor('tox21', {'algorithm_vers    │
│ ❱ 11 get_toxicity = lambda x: np.mean(toxicity_fn(x))                                            │
│   12                                                                                             │
│   13 def get_affinity(strings):                                                                  │
│   14 │   l = len(strings)                                                                        │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/gt4sd/a │
│ lgorithms/core.py:365 in __call__                                                                │
│                                                                                                  │
│   362 │                                                                                          │
│   363 │   def __call__(self, input: Any) -> Any:                                                 │
│   364 │   │   """Alias for `self.predict`."""                                                    │
│ ❱ 365 │   │   return self.predict(input)                                                         │
│   366                                                                                            │
│   367                                                                                            │
│   368 @dataclass                                                                                 │
│                                                                                                  │
│ /projects/msieve_dev3/usr/itaiguez/envs_miniconda/envs/gt4sd/lib/python3.8/site-packages/gt4sd/a │
│ lgorithms/core.py:360 in predict                                                                 │
│                                                                                                  │
│   357 │   │   │   )                                                                              │
│   358 │   │   │   logger.warning(detail + " Exiting now!")                                       │
│   359 │   │   except Exception:                                                                  │
│ ❱ 360 │   │   │   raise Exception(f"{self.__class__.__name__} failed with {input}")              │
│   361 │   │   return predicted                                                                   │
│   362 │                                                                                          │
│   363 │   def __call__(self, input: Any) -> Any:                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Tox21 failed with COC(=O)CNC=C(C(c1ccccc1)C(=O)NCc1cccnc1NS(=O))
Out of the 50 generated molecules,
GT4SD / gt4sd-core

issue when running jupyter visum-2022-handson-generative-models in GPU #225