Closed drose188 closed 3 years ago
This is also happening when training TD3 without optuna. I doubled checked it and it works fine with SAC agent. The problem is with TD3 only.
Hello,
Please fill up the issue template completely (including using markdown codeblock to format the code and provide a minimal working example to reproduce the bug). Overall, it seems related to PyTorch (you are probably using half-precision) and not SB3 at all.
Well, I tried different versions and combinations of pytroch with or without cuda. It works in windows but not in ubuntu.
Overall, it seems related to PyTorch (you are probably using half-precision) and not SB3 at all.
closing for the reason mentioned above.
This is happening using TD3 only. Works fine with SAC:
error: RuntimeError('"clamp_cpu" not implemented for \'Half\'',) Traceback (most recent call last): File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 198, in _run_trial value_or_values = func(trial) File "opt.py", line 311, in optimize_agent online_model.learn(total_timesteps=(int(learn_params['total_timestamp']))) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 207, in learn reset_num_timesteps=reset_num_timesteps, File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 272, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 146, in train noise = noise.clamp(-self.target_noise_clip, self.target_noise_clip) RuntimeError: "clamp_cpu" not implemented for 'Half' Traceback (most recent call last): File "opt.py", line 384, in
study.optimize(my_optimise.optimize_agent, n_trials=1000, n_jobs=1) # n_jobs=-1
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/study.py", line 381, in optimize
show_progress_bar=show_progress_bar,
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 70, in _optimize
progress_bar=progress_bar,
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 161, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 249, in _run_trial
raise func_err
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/optuna/_optimize.py", line 198, in _run_trial
value_or_values = func(trial)
File "opt.py", line 311, in optimize_agent
online_model.learn(total_timesteps=(int(learn_params['total_timestamp'])))
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 207, in learn
reset_num_timesteps=reset_num_timesteps,
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 272, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/stable_baselines3/td3/td3.py", line 146, in train
noise = noise.clamp(-self.target_noise_clip, self.target_noise_clip)
RuntimeError: "clamp_cpu" not implemented for 'Half'