araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.13k stars 208 forks source link

Using custom wrappers while training models #96

Closed meric-sakarya closed 3 years ago

meric-sakarya commented 4 years ago

I am working on the Pendulum-v0 environment with the algorithm ppo2. I already managed to get the algorithm running with the suggested hyperparameters from the RL Baselines Zoo and achieved results. However, I did not manage to get a good looking graph, same issue as here I suppose. This problem did not occur when I changed the number of environments to 1, but yet it is not solved and I could not manage to wrap the environment with the Monitor wrapper. Furthermore, I am trying to use on top of the Monitor wrapper, a custom wrapper I made to change observations to images with the help of env.render("rgb_array") and the FrameStack wrapper to stack those frames. I think I could solve these issues if I did not need to use Zoo for the hyperparameters but rather just worked on my original code with the said hyperparameters. I suppose an answer to one of these two questions (preferably the first one) would solve my issues:

  1. How may I use the hyperparameters from Zoo in my code? When I try to run my code I get the following error:

The code:

import time

import gym
from gym import Wrapper, spaces
import numpy as np
from gym.envs.classic_control import PendulumEnv

from stable_baselines.common.env_checker import check_env
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import PPO2
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines.bench import Monitor

import tensorflow as tf

# tensorboard --logdir=PPO2_DEFAULT_PENDULUM:C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Pycharm\pioneer\ppo2_pendulum_default_tensorboard --host localhost

log_dir = "/tmp/gym/{}".format(int(time.time()))
os.makedirs(log_dir, exist_ok=True)

config = tf.ConfigProto()

config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

TEST_COUNT = 100

pendulum_env = gym.make('Pendulum-v0')
pendulum_env = Monitor(pendulum_env, log_dir, allow_early_resets=True)
check_env(pendulum_env, warn=True)

model = PPO2(n_envs=8, n_timesteps=2e6, policy='MlpPolicy', n_steps=2048, nminibatches=32, lam=0.95, gamma=0.99,
             noptepochs=10, ent_coef=0.0, learning_rate=3e-4,
             cliprange=0.2, env=pendulum_env, verbose=1, tensorboard_log="./ppo2_pendulum_default_tensorboard/")
model.learn(total_timesteps=100_000, log_interval=10)
model.save("ppo2_pendulum_default")

The error:

Traceback (most recent call last):
  File "C:/Users/meric/OneDrive/Masaüstü/TUM/Thesis/Pycharm/pioneer/pendulum_default_PPO2.py", line 35, in <module>
    cliprange=0.2, env=pendulum_env, verbose=1, tensorboard_log="./ppo2_pendulum_default_tensorboard/")
TypeError: __init__() got an unexpected keyword argument 'n_envs'
  1. How can I use wrappers while training an agent with the usage of Zoo?

I tried to copy the code of the Monitor wrapper in the wrappers.py file and added the following line to the ppo2.yml file: env_wrapper: utils.wrappers.Monitor

Traceback (most recent call last):                                                                                                                                                                                   
  File "train.py", line 210, in <module>                                                                                                                                                                               
    env_wrapper = get_wrapper_class(hyperparams)                                                                                                                                                                     
  File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\utils.py", line 130, in get_wrapper_class                                                                                               
    wrapper_module = importlib.import_module(get_module_name(wrapper_name))                                                                                                                                          
  File "C:\Users\meric\Anaconda3\envs\pioneer\lib\importlib\__init__.py", line 127, in import_module                                                                                                                   
    return _bootstrap._gcd_import(name[level:], package, level)                                                                                                                                                      
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import                                                                                                                                                    
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load                                                                                                                                                  
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked                                                                                                                                         
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked                                                                                                                                                  
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module                                                                                                                                            
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                                                                                       
  File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\wrappers.py", line 78, in <module>                                                                                                      
    class Monitor(gym.Wrapper):                                                                                                                                                                                      
  File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\wrappers.py", line 96, in Monitor                                                                                                       
    info_keywords=()):                                                                                                                                                                                             NameError: name 'Optional' is not defined 

I also tried, without adding anything to the wrappers.py file, adding the following line to the ppo2.yml file: env_wrapper: stable_baselines.bench.monitor

Traceback (most recent call last):                                                                                                                                                                                       
  File "train.py", line 284, in <module>                                                                                                                                                                               
    env = create_env(n_envs)                                                                                                                                                                                         
  File "train.py", line 257, in create_env                                                                                                                                                                             
    env = DummyVecEnv([make_env(env_id, 0, args.seed, wrapper_class=env_wrapper, log_dir=log_dir, env_kwargs=env_kwargs)])                                                                                           
  File "C:\Users\meric\Anaconda3\envs\pioneer\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line 
    23, in __init__ self.envs = [fn() for fn in env_fns]                                                                                                                                                                             
  File "C:\Users\meric\Anaconda3\envs\pioneer\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line 
    23, in <listcomp> self.envs = [fn() for fn in env_fns]                                                                                                                                                                             
  File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\utils.py", line 175, in _init                                                                                                           
    env = wrapper_class(env)                                                                                                                                                                                         
  File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\utils.py", line 141, in wrap_env                                                                                                        
    env = wrapper_class(env, **kwargs)                                                                                                                                                                             TypeError: 'module' object is not callable

System Info Describe the characteristic of your environment:

Additional context Yesterday I created two other issues #94 and #95, they were both closed because I could not explain my issues properly and also did not manage to properly follow the template. I deeply apologize for my amateur behaviour. I just started using these issue templates and the whole concept is new to me. I am working on an important project and therefore it is very crucial for me to solve these issues, hence the bombard of questions in both forums rl-baselines-zoo & stable-baselines. Thank you very much for your answers so far and for the great documentation, it helps a lot.

araffin commented 3 years ago

Should be fixed in SB3 and its zoo (please read the README and take a look at examples of custom wrappers in the repo): https://github.com/DLR-RM/rl-baselines3-zoo