DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.84k stars 1.68k forks source link

[Question] A error while using SAC and DDPG #1923

Closed minxuef closed 4 months ago

minxuef commented 4 months ago

❓ Question

我使用PPO和A2C是可以运行的,换了DDPG和SAC就不行了,在第一个回合结束后会报下面的错误Why is my code generating this error?

Traceback (most recent call last):
  File "D:\ps\anaconda\envs\metro-env1\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "D:\ps\pycharm\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "D:\ps\pycharm\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/桌面/study/code-study/rl4metro-main1/rl4metro-main 4.20/train.py", line 140, in <module>
    model_ddpg.learn(total_timesteps=time_steps, tb_log_name='DDPG', reset_num_timesteps=False,callback=callback)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\ddpg\ddpg.py", line 125, in learn
    return super().learn(
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\td3\td3.py", line 214, in learn
    return super().learn(
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 353, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\td3\td3.py", line 166, in train
    next_q_values = th.cat(self.critic_target(replay_data.next_observations, next_actions), dim=1)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\policies.py", line 945, in forward
    return tuple(q_net(qvalue_input) for q_net in self.q_networks)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\policies.py", line 945, in <genexpr>
    return tuple(q_net(qvalue_input) for q_net in self.q_networks)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
    input = module(input)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Double

Checklist

qgallouedec commented 4 months ago

Use the bug report issue template and write in English please