araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.13k stars 208 forks source link

Google colab error for Soft actor critic #18

Closed testerpce closed 5 years ago

testerpce commented 5 years ago

I am running the rl-baselines-zoo for humanoid bullet in google colab. At first I ran it with ppo2 and it gave a very good result with rewards going upto 1600. Now I am running the Softactor critic and it is giving the following error.

!python train.py --algo sac --env HumanoidBulletEnv-v0 --n-timesteps 10000000

========== HumanoidBulletEnv-v0 ==========
OrderedDict([('batch_size', 64),
             ('buffer_size', 1000000),
             ('ent_coef', 'auto'),
             ('gradient_steps', 1),
             ('learning_rate', 'lin_3e-4'),
             ('learning_starts', 1000),
             ('n_timesteps', 20000000.0),
             ('normalize', "{'norm_obs': True, 'norm_reward': False}"),
             ('policy', 'CustomSACPolicy'),
             ('train_freq', 1)])
Using 1 environments
pybullet build time: Apr 11 2019 07:40:52
/usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Normalizing input and return
Traceback (most recent call last):
  File "train.py", line 171, in <module>
    model = ALGOS[args.algo](env=env, tensorboard_log=tensorboard_log, verbose=1, **hyperparams)
TypeError: 'NoneType' object is not callable
[fc45386ee43e:03596] *** Process received signal ***
[fc45386ee43e:03596] Signal: Segmentation fault (11)
[fc45386ee43e:03596] Signal code: Address not mapped (1)
[fc45386ee43e:03596] Failing at address: 0x7f0ed740320d
[fc45386ee43e:03596] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f0eda6b8890]
[fc45386ee43e:03596] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7f0eda2f7785]
[fc45386ee43e:03596] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7f0edab62e44]
[fc45386ee43e:03596] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7f0eda2f8615]
[fc45386ee43e:03596] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7f0edab60cb3]
[fc45386ee43e:03596] *** End of error message ***
araffin commented 5 years ago

Hello, could you share the link to your google colab notebook?

testerpce commented 5 years ago

Sure

Here it is

https://colab.research.google.com/drive/1dnROnz1kDQsHI4ReTjjF79EnHc3GlvKd

araffin commented 5 years ago

Please update stable-baselines, SAC was introduced only in v 2.4.0 (you have v2.2.1)

Doing:

!pip install stable-baselines --upgrade

will solve your issue.

Please fill in the issue template completely next time ;) (notably version of SB)

araffin commented 5 years ago

PS: I'll update the colab notebook, I assumed it would download the latest version of SB automatically, which is apparently not the case.

testerpce commented 5 years ago

Ok I will write the version of stable baselines next time. Sorry I did not notice that. Anyways I am training soft-actor-critic on HumanoidBulletEnv-vo and it seems that after a while the rewards go down and right now they are spiralling towards negative values. Are the hyperparameters for Softactor critic for Humanoid Bullet env -v0 not tuned? Because I saw on the scores that SAC reaches upto 2048 rewards on 149000 steps on HumanoidBulletEnv-v0. Are the hyperpaprameters maybe not tuned on the updated version of stable-baseline?

araffin commented 5 years ago

Because I saw on the scores that SAC reaches upto 2048 rewards on 149000 steps on HumanoidBulletEnv-v0.

The reported performance is after full training (2e7 steps in that case). Hyperparameters are not tuned yet, but should give you ok results given you train it fully.

Anyways I am training soft-actor-critic on HumanoidBulletEnv-vo and it seems that after a while the rewards go down and right now they are spiralling towards negative values.

I'm missing quite a lot of information here. What command did you use, how long did you wait, how many random seeds did you try?

testerpce commented 5 years ago

!python train.py --algo sac --env HumanoidBulletEnv-v0

This is the command I used so it seems that even the timesteps for the hyperparameters would be taken from the hyperparameters file and I reported this at the end of some 18000000 steps. Now at the end of the whole training, it is giving negative values. I ran the program twice but I did not change anything. I just ran the above code twice. Should I try changing the random seeds manually by parsing?