PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.25k stars 820 forks source link

本地机跑QuickStart跑不到最后就停了,而且Test reward怎么这么高,教程上是达到200就结束了 #981

Closed 18438622356 closed 1 year ago

18438622356 commented 1 year ago

本地机跑QuickStart跑不到最后就停了,而且Test reward怎么这么高,教程上是达到200就结束了 本地机所用环境: paddlepaddle version: 2.3.2. gym==0.12.1 parl ==2.0.5

以下是到最后停止不动的代码,在AIstudio上跑能跑完,Test reward达到200就停止了; 本地机每次跑,停止的地方不一样,有时候Episode到790,有时候Episode到500多就停止不动了。

[11-11 23:11:17 MainThread @train.py:100] Test reward: 1607.2 [11-11 23:11:18 MainThread @train.py:91] Episode 600, Reward Sum 163.0. [11-11 23:11:18 MainThread @train.py:91] Episode 610, Reward Sum 235.0. [11-11 23:11:19 MainThread @train.py:91] Episode 620, Reward Sum 351.0. [11-11 23:11:19 MainThread @train.py:91] Episode 630, Reward Sum 256.0. [11-11 23:11:20 MainThread @train.py:91] Episode 640, Reward Sum 480.0. [11-11 23:11:21 MainThread @train.py:91] Episode 650, Reward Sum 206.0. [11-11 23:11:21 MainThread @train.py:91] Episode 660, Reward Sum 293.0. [11-11 23:11:22 MainThread @train.py:91] Episode 670, Reward Sum 653.0. [11-11 23:11:23 MainThread @train.py:91] Episode 680, Reward Sum 363.0. [11-11 23:11:24 MainThread @train.py:91] Episode 690, Reward Sum 250.0. [11-11 23:11:27 MainThread @train.py:100] Test reward: 4175.6 [11-11 23:11:27 MainThread @train.py:91] Episode 700, Reward Sum 440.0. [11-11 23:11:28 MainThread @train.py:91] Episode 710, Reward Sum 497.0. [11-11 23:11:29 MainThread @train.py:91] Episode 720, Reward Sum 795.0. [11-11 23:11:30 MainThread @train.py:91] Episode 730, Reward Sum 230.0. [11-11 23:11:30 MainThread @train.py:91] Episode 740, Reward Sum 365.0. [11-11 23:11:31 MainThread @train.py:91] Episode 750, Reward Sum 298.0. [11-11 23:11:32 MainThread @train.py:91] Episode 760, Reward Sum 479.0. [11-11 23:11:33 MainThread @train.py:91] Episode 770, Reward Sum 842.0. [11-11 23:11:34 MainThread @train.py:91] Episode 780, Reward Sum 85.0. [11-11 23:11:34 MainThread @train.py:91] Episode 790, Reward Sum 489.0.

TomorrowIsAnOtherDay commented 1 year ago

hello,你有更改过本地的代码吗?

18438622356 commented 1 year ago

没有呀,你在本地能跑吗

TomorrowIsAnOtherDay commented 1 year ago

可以跑的,你的系统是windows还是Linux呢?能发下系统信息吗?

18438622356 commented 1 year ago

我的是win10的 处理器:11th Gen Intel(R) Core(TM) i5-1135G7@ 2.40GHz 2.42GHz 内存RAM:16.0 GB(15.8 GB可用) nvidia显卡,显存是2G的

TomorrowIsAnOtherDay commented 1 year ago

收到,等下周到公司处理下,感谢反馈。

18438622356 commented 1 year ago

本地可以跑了,之前不能跑的原因是,我在github下载的是zip,github默认下载的是PARL-develop,我用AIstudio克隆后在AIstudio下载放到本地就可以了,AIstudio上的应该是最新的代码,两份代码可能有些不一样,谢谢~~

TomorrowIsAnOtherDay commented 1 year ago

总结下是这样吗? 你使用最新版的代码(PARL-develop)运行,reward会超越200?但是使用旧版本代码,reward就正常?

18438622356 commented 1 year ago

刚又重新下载了,本地试了一遍PARL-develop,也能跑了,reward也是200,但是我之前也没改过代码啊,很奇怪^_^

18438622356 commented 1 year ago

刚又重新下载了,本地试了一遍PARL-develop,也能跑了,reward也是200,但是我之前也没改过代码啊,很奇怪^_^

TomorrowIsAnOtherDay commented 1 year ago

那等下次你能稳定复现稳定再另行发起issue哈。我先关闭了。