araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.13k stars 208 forks source link

Changing hyper-parameters in PPO2 #94

Closed meric-sakarya closed 4 years ago

meric-sakarya commented 4 years ago

I am trying to change the hyperparameters of PPO2 to achieve better results in Pendulum-v0. I have tried using the parameters from RL Baselines Zoo here but I receive this error.

Traceback (most recent call last):
  File "C:/Users/meric/OneDrive/Masaüstü/TUM/Thesis/Pycharm/pioneer/pendulum_default_PPO2.py", line 35, in <module>
    cliprange=0.2, env=pendulum_env, verbose=1, tensorboard_log="./ppo2_pendulum_default_tensorboard/")
TypeError: __init__() got an unexpected keyword argument 'n_envs'

Here is the code:

import os
import time

import gym
from gym import Wrapper, spaces
import numpy as np
from gym.envs.classic_control import PendulumEnv

from stable_baselines.common.env_checker import check_env
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import PPO2
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines.bench import Monitor

import tensorflow as tf

# tensorboard --logdir=PPO2_DEFAULT_PENDULUM:C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Pycharm\pioneer\ppo2_pendulum_default_tensorboard --host localhost

log_dir = "/tmp/gym/{}".format(int(time.time()))
os.makedirs(log_dir, exist_ok=True)

config = tf.ConfigProto()

config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

TEST_COUNT = 100

pendulum_env = gym.make('Pendulum-v0')
pendulum_env = Monitor(pendulum_env, log_dir, allow_early_resets=True)
check_env(pendulum_env, warn=True)

model = PPO2(n_envs=8, n_timesteps=2e6, policy='MlpPolicy', n_steps=2048, nminibatches=32, lam=0.95, gamma=0.99,
             noptepochs=10, ent_coef=0.0, learning_rate=3e-4,
             cliprange=0.2, env=pendulum_env, verbose=1, tensorboard_log="./ppo2_pendulum_default_tensorboard/")
model.learn(total_timesteps=100_000, log_interval=10)
model.save("ppo2_pendulum_default")

System Info Describe the characteristic of your environment:

Describe how stable baselines was installed (pip, docker, source, ...): source GPU models and configuration: NVIDIA GTX 1050 with CUDA 10.0 and cuDNN 7.6.5 Python version: 3.7 Tensorflow version: 1.15

araffin commented 4 years ago

Hello,

Why are you not using the zoo? Please read the documentation to understand the meaning of each variable (n_envs correspond to the number of environments/workers to collect data). You can also read the code from the zoo train.py and utils/utils.py to see how n_envs is used.

meric-sakarya commented 4 years ago

I don't know how to install zoo on Windows and couldn't find documentation on it.

araffin commented 4 years ago

I don't know how to install zoo on Windows and couldn't find documentation on it.

You just need to clone the repo and install python dependencies, it is not specific to windows. You could also use it online using google colab (cf README).

meric-sakarya commented 4 years ago

Can't I just use the hyperparameters provided by zoo without actually using zoo? I am not interested in already trained agents and also I am going to need some wrappers in another environment.

araffin commented 4 years ago

Can't I just use the hyperparameters provided by zoo without actually using zoo? I am not interested in already trained agents and also I am going to need some wrappers in another environment.

In SB3 version (https://github.com/DLR-RM/rl-baselines3-zoo), you can download the training scripts without the trained agents. Here, you can download the repo and then delete the trained_agents folder. Otherwise, you can just copy the files.

meric-sakarya commented 4 years ago

I managed to get an agent working thanks for the help. But isn't this graph pretty weird? What might be causing this? Can I wrap this with the monitor wrapper to get a better looking graph?

araffin commented 4 years ago

There are already issues about that: https://github.com/hill-a/stable-baselines/issues/143 In short: you should ignore it and use Monitor wrapper or proper evaluation (using EvalCallback included in the zoo).

Closing this issue as the original questions was answered.

meric-sakarya commented 4 years ago

@araffin I added the following line to the ppo2.yml file: env_wrapper: stable_baselines.bench.monitor. It did not work and I received his error when trying to restart the training: 2020-07-25 (1) Should I create a new issue for this?