Closed acriptis closed 4 years ago
The same problem with A2C alogrithm on the same environment:
python run.py --gym -a a2c -n train_using_gym --gym-env MountainCar-v0 --render-episode 100 --gym-agents 1
Hi, @acriptis
Thank you for your report.
Actually, what you encountered is a bug that I ignored before when changing the way rendering gym environment. I've fixed it in commit 02361d2.
record
is used to specify whether saving video when rendering.
MountainCar-v0
is a classic sparse reward env and is hard to tackle with naive RL algorithms. When I test ppo
in this env, I get same result as you, full of -200
. To some extent, MountainCar-v0
is even harder than MountainCarContinuous-v0
.
Try other algorithms instead, i.e. maxsqn
, or specify use_curiosity
in Algorithms/config.yaml
. BTW, off-policy algorithm with specifying pre_fill_steps
in ./config.yaml
works better.
python run.py --gym -a maxsqn --max-step 200 -n train_using_gym --gym-env MountainCar-v0 --gym-agents 4
maxsqn
without using ICM works good:
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:26 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 167 | step: 113 | last_done_step 113 | rewards: -113.0, -100.0, -90.0, -84.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:28 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 168 | step: 160 | last_done_step 160 | rewards: -160.0, -149.0, -88.0, -84.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:30 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 169 | step: 149 | last_done_step 149 | rewards: -149.0, -111.0, -88.0, -87.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:32 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 170 | step: 167 | last_done_step 167 | rewards: -167.0, -160.0, -91.0, -84.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:35 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 171 | step: 165 | last_done_step 165 | rewards: -165.0, -112.0, -111.0, -85.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:36 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 172 | step: 125 | last_done_step 125 | rewards: -125.0, -121.0, -111.0, -85.0
close now, feel free to reopen this issue.
@StepNeverStop , Thank you for clarification!
Thanks for your developement it seems to be inspiring project! Although when I tried to launch a command from Examples:
python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 1000 --gym-agents 4
I've got Error:render() missing 1 required positional argument: 'record'
Part of the log before the exception:
Could you explain how to fix/overcome this error?
PS. Just before this I tried to launch the same env and model with command:
python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 100 --gym-agents 1
It was executing a little bit longer but had no success in improving reward (always -200). And finally it had finished with the same exception.