PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.22k stars 816 forks source link

PPO-mujoco GPU训练reward曲线与CPU相差较大 #973

Closed USTCKAY closed 1 year ago

USTCKAY commented 1 year ago

python版本:3.7.0 环境:paddlepaddle-gpu 2.3.2 parl 2.0.5 /paddle/PARL gym 0.18.0 mujoco-py 2.1.2.14 PyYAML 6.0 mujoco版本:210 GPU单卡训练,训练指令:python train.py --env 'HalfCheetah-v2' --continuous_action --train_total_steps 1000000 GPU rewards曲线: 截屏2022-10-31 10 01 00 CPU rewards曲线: 截屏2022-10-31 19 48 11 可以看到,CPU的rewards能到达4000左右,与官方的结果相似,但GPU只能达到1600,与CPU结果相差较大。

更新:又用gpu跑了一遍,reward曲线似乎正常了 截屏2022-11-01 14 30 10

TomorrowIsAnOtherDay commented 1 year ago

感谢反馈,请问这是mujoco哪个任务呢?

USTCKAY commented 1 year ago

感谢反馈,请问这是mujoco哪个任务呢?

HalfCheetah-v2

TomorrowIsAnOtherDay commented 1 year ago

duplicated to #974