DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.26k stars 1.71k forks source link

RuntimeError: "clamp_cpu" not implemented for 'Half' #301

Closed drose188 closed 3 years ago

drose188 commented 3 years ago

This is happening using TD3 only. Works fine with SAC:

error: RuntimeError('"clamp_cpu" not implemented for \'Half\'',) Traceback (most recent call last): File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 198, in _run_trial value_or_values = func(trial) File "opt.py", line 311, in optimize_agent online_model.learn(total_timesteps=(int(learn_params['total_timestamp']))) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 207, in learn reset_num_timesteps=reset_num_timesteps, File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 272, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 146, in train noise = noise.clamp(-self.target_noise_clip, self.target_noise_clip) RuntimeError: "clamp_cpu" not implemented for 'Half' Traceback (most recent call last): File "opt.py", line 384, in study.optimize(my_optimise.optimize_agent, n_trials=1000, n_jobs=1) # n_jobs=-1 File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/study.py", line 381, in optimize show_progress_bar=show_progress_bar, File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 70, in _optimize progress_bar=progress_bar, File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 161, in _optimize_sequential trial = _run_trial(study, func, catch) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 249, in _run_trial raise func_err File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 198, in _run_trial value_or_values = func(trial) File "opt.py", line 311, in optimize_agent online_model.learn(total_timesteps=(int(learn_params['total_timestamp']))) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 207, in learn reset_num_timesteps=reset_num_timesteps, File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 272, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 146, in train noise = noise.clamp(-self.target_noise_clip, self.target_noise_clip) RuntimeError: "clamp_cpu" not implemented for 'Half'

drose188 commented 3 years ago

This is also happening when training TD3 without optuna. I doubled checked it and it works fine with SAC agent. The problem is with TD3 only.

araffin commented 3 years ago

Hello,

Please fill up the issue template completely (including using markdown codeblock to format the code and provide a minimal working example to reproduce the bug). Overall, it seems related to PyTorch (you are probably using half-precision) and not SB3 at all.

drose188 commented 3 years ago

Well, I tried different versions and combinations of pytroch with or without cuda. It works in windows but not in ubuntu.

araffin commented 3 years ago

Overall, it seems related to PyTorch (you are probably using half-precision) and not SB3 at all.

closing for the reason mentioned above.