PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.24k stars 819 forks source link

为什么用GPU版本的paddle跑科科老师的DQN和DDPG案例时,test reward一直处于比较低的值? #935

Closed OutSpace00 closed 2 years ago

OutSpace00 commented 2 years ago

起初,我用CPU版本的paddle跑科科老师的DQN和DDPG案例,都能达到和老师几乎一样的效果。 但是,当我用GPU版本的时候,就会出现test reward一直处于比较低的值。我增加了10倍训练轮数,结果也是一样的,测试奖励并不会增加。 我用的python3.7,cuda10.2,cudnn7.6.5,paddlepaddle-gpu==2.3.1,parl2.0.3 下面是GPU跑DQN的打印结果: [07-20 14:09:46 MainThread @train.py:125] episode:50 e_greed:0.09930499999999931 Test reward:9.0 [07-20 14:09:47 MainThread @train.py:125] episode:100 e_greed:0.0987969999999988 Test reward:9.4 [07-20 14:09:48 MainThread @train.py:125] episode:150 e_greed:0.09830899999999831 Test reward:9.8 [07-20 14:09:49 MainThread @train.py:125] episode:200 e_greed:0.0977949999999978 Test reward:9.4 [07-20 14:09:50 MainThread @train.py:125] episode:250 e_greed:0.0972929999999973 Test reward:9.6 [07-20 14:09:52 MainThread @train.py:125] episode:300 e_greed:0.0967949999999968 Test reward:9.4 [07-20 14:09:53 MainThread @train.py:125] episode:350 e_greed:0.09630299999999631 Test reward:9.4 [07-20 14:09:54 MainThread @train.py:125] episode:400 e_greed:0.0957939999999958 Test reward:8.8 [07-20 14:09:55 MainThread @train.py:125] episode:450 e_greed:0.0952879999999953 Test reward:9.8 [07-20 14:09:56 MainThread @train.py:125] episode:500 e_greed:0.0947969999999948 Test reward:9.0 [07-20 14:09:58 MainThread @train.py:125] episode:550 e_greed:0.09431099999999432 Test reward:9.8 [07-20 14:09:59 MainThread @train.py:125] episode:600 e_greed:0.0937959999999938 Test reward:9.2 [07-20 14:10:00 MainThread @train.py:125] episode:650 e_greed:0.09330899999999331 Test reward:9.4 [07-20 14:10:01 MainThread @train.py:125] episode:700 e_greed:0.0927979999999928 Test reward:9.2 [07-20 14:10:03 MainThread @train.py:125] episode:750 e_greed:0.09230899999999231 Test reward:9.2 [07-20 14:10:04 MainThread @train.py:125] episode:800 e_greed:0.09182899999999183 Test reward:9.2 [07-20 14:10:05 MainThread @train.py:125] episode:850 e_greed:0.09133599999999134 Test reward:9.4 [07-20 14:10:06 MainThread @train.py:125] episode:900 e_greed:0.09083899999999084 Test reward:9.4 [07-20 14:10:07 MainThread @train.py:125] episode:950 e_greed:0.09035599999999036 Test reward:10.2 [07-20 14:10:09 MainThread @train.py:125] episode:1000 e_greed:0.08986599999998987 Test reward:9.6 [07-20 14:10:10 MainThread @train.py:125] episode:1050 e_greed:0.08936199999998937 Test reward:9.0 [07-20 14:10:11 MainThread @train.py:125] episode:1100 e_greed:0.08887799999998888 Test reward:9.2 [07-20 14:10:12 MainThread @train.py:125] episode:1150 e_greed:0.0883909999999884 Test reward:9.2 [07-20 14:10:13 MainThread @train.py:125] episode:1200 e_greed:0.0878979999999879 Test reward:9.8 [07-20 14:10:15 MainThread @train.py:125] episode:1250 e_greed:0.08741499999998742 Test reward:8.8 [07-20 14:10:16 MainThread @train.py:125] episode:1300 e_greed:0.08692899999998693 Test reward:9.6 [07-20 14:10:17 MainThread @train.py:125] episode:1350 e_greed:0.08643799999998644 Test reward:9.4 [07-20 14:10:18 MainThread @train.py:125] episode:1400 e_greed:0.08596299999998597 Test reward:9.2 [07-20 14:10:19 MainThread @train.py:125] episode:1450 e_greed:0.08547799999998548 Test reward:9.2 [07-20 14:10:20 MainThread @train.py:125] episode:1500 e_greed:0.08497999999998498 Test reward:9.8 [07-20 14:10:22 MainThread @train.py:125] episode:1550 e_greed:0.08448599999998449 Test reward:9.2 [07-20 14:10:23 MainThread @train.py:125] episode:1600 e_greed:0.08397399999998398 Test reward:9.8 [07-20 14:10:24 MainThread @train.py:125] episode:1650 e_greed:0.0834989999999835 Test reward:9.2 [07-20 14:10:25 MainThread @train.py:125] episode:1700 e_greed:0.082997999999983 Test reward:9.2 [07-20 14:10:27 MainThread @train.py:125] episode:1750 e_greed:0.08250899999998251 Test reward:9.4 [07-20 14:10:28 MainThread @train.py:125] episode:1800 e_greed:0.08200899999998201 Test reward:10.2 [07-20 14:10:29 MainThread @train.py:125] episode:1850 e_greed:0.08152099999998152 Test reward:9.2 [07-20 14:10:30 MainThread @train.py:125] episode:1900 e_greed:0.08103199999998104 Test reward:10.0 [07-20 14:10:31 MainThread @train.py:125] episode:1950 e_greed:0.08054399999998055 Test reward:9.4 [07-20 14:10:33 MainThread @train.py:125] episode:2000 e_greed:0.08005699999998006 Test reward:9.2

TomorrowIsAnOtherDay commented 2 years ago

Duplicated with #934