StepNeverStop / RLs

Reinforcement Learning Algorithms Based on PyTorch
https://stepneverstop.github.io
Apache License 2.0
448 stars 93 forks source link

Exception during running MountainCar-v0 case with ppo #18

Closed acriptis closed 4 years ago

acriptis commented 4 years ago

Thanks for your developement it seems to be inspiring project! Although when I tried to launch a command from Examples: python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 1000 --gym-agents 4 I've got Error: render() missing 1 required positional argument: 'record'

Part of the log before the exception:

INFO:common.agent:| Model-0 |no op step 2496
INFO:common.agent:| Model-0 |no op step 2497
INFO:common.agent:| Model-0 |no op step 2498
INFO:common.agent:| Model-0 |no op step 2499
WARNING:tensorflow:Layer a_c_v_discrete is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

^[[BINFO:common.agent:| Model-0 |Pass time(h:m:s) 00:00:10 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 100 | step: 2000 | last_done_step  200 | rewards: -200.0, -200.0, -200.0, -200.0
Save checkpoint success. Episode: 100
render() missing 1 required positional argument: 'record'

Could you explain how to fix/overcome this error?

PS. Just before this I tried to launch the same env and model with command:

python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 100 --gym-agents 1

It was executing a little bit longer but had no success in improving reward (always -200). And finally it had finished with the same exception.

acriptis commented 4 years ago

The same problem with A2C alogrithm on the same environment:

python run.py --gym -a a2c -n train_using_gym --gym-env MountainCar-v0 --render-episode 100 --gym-agents 1
StepNeverStop commented 4 years ago

Hi, @acriptis

Thank you for your report.

Actually, what you encountered is a bug that I ignored before when changing the way rendering gym environment. I've fixed it in commit 02361d2.

record is used to specify whether saving video when rendering.

MountainCar-v0 is a classic sparse reward env and is hard to tackle with naive RL algorithms. When I test ppo in this env, I get same result as you, full of -200. To some extent, MountainCar-v0 is even harder than MountainCarContinuous-v0.

Try other algorithms instead, i.e. maxsqn, or specify use_curiosity in Algorithms/config.yaml. BTW, off-policy algorithm with specifying pre_fill_steps in ./config.yaml works better.

python run.py --gym -a maxsqn --max-step 200 -n train_using_gym --gym-env MountainCar-v0 --gym-agents 4 maxsqn without using ICM works good:

INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:26 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 167 | step:  113 | last_done_step  113 | rewards: -113.0, -100.0, -90.0, -84.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:28 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 168 | step:  160 | last_done_step  160 | rewards: -160.0, -149.0, -88.0, -84.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:30 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 169 | step:  149 | last_done_step  149 | rewards: -149.0, -111.0, -88.0, -87.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:32 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 170 | step:  167 | last_done_step  167 | rewards: -167.0, -160.0, -91.0, -84.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:35 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 171 | step:  165 | last_done_step  165 | rewards: -165.0, -112.0, -111.0, -85.0
INFO:common.agent:| Model-0 |Pass time(h:m:s) 00:07:36 |----------------------------------------
INFO:common.agent:| Model-0 |Episode: 172 | step:  125 | last_done_step  125 | rewards: -125.0, -121.0, -111.0, -85.0
StepNeverStop commented 4 years ago

close now, feel free to reopen this issue.

acriptis commented 4 years ago

@StepNeverStop , Thank you for clarification!