我使用PPO和A2C是可以运行的,换了DDPG和SAC就不行了,在第一个回合结束后会报下面的错误Why is my code generating this error?
Traceback (most recent call last):
File "D:\ps\anaconda\envs\metro-env1\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "D:\ps\pycharm\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "D:\ps\pycharm\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/桌面/study/code-study/rl4metro-main1/rl4metro-main 4.20/train.py", line 140, in <module>
model_ddpg.learn(total_timesteps=time_steps, tb_log_name='DDPG', reset_num_timesteps=False,callback=callback)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\ddpg\ddpg.py", line 125, in learn
return super().learn(
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\td3\td3.py", line 214, in learn
return super().learn(
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 353, in learn
self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\td3\td3.py", line 166, in train
next_q_values = th.cat(self.critic_target(replay_data.next_observations, next_actions), dim=1)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\policies.py", line 945, in forward
return tuple(q_net(qvalue_input) for q_net in self.q_networks)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\policies.py", line 945, in <genexpr>
return tuple(q_net(qvalue_input) for q_net in self.q_networks)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
input = module(input)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Double
Checklist
[X] I have checked that there is no similar issue in the repo
❓ Question
我使用PPO和A2C是可以运行的,换了DDPG和SAC就不行了,在第一个回合结束后会报下面的错误Why is my code generating this error?
Checklist