I am working on the Pendulum-v0 environment with the algorithm ppo2. I already managed to get the algorithm running with the suggested hyperparameters from the RL Baselines Zoo and achieved results. However, I did not manage to get a good looking graph, same issue as here I suppose. This problem did not occur when I changed the number of environments to 1, but yet it is not solved and I could not manage to wrap the environment with the Monitor wrapper. Furthermore, I am trying to use on top of the Monitor wrapper, a custom wrapper I made to change observations to images with the help of env.render("rgb_array") and the FrameStack wrapper to stack those frames. I think I could solve these issues if I did not need to use Zoo for the hyperparameters but rather just worked on my original code with the said hyperparameters. I suppose an answer to one of these two questions (preferably the first one) would solve my issues:
How may I use the hyperparameters from Zoo in my code? When I try to run my code I get the following error:
The code:
import time
import gym
from gym import Wrapper, spaces
import numpy as np
from gym.envs.classic_control import PendulumEnv
from stable_baselines.common.env_checker import check_env
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import PPO2
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines.bench import Monitor
import tensorflow as tf
# tensorboard --logdir=PPO2_DEFAULT_PENDULUM:C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Pycharm\pioneer\ppo2_pendulum_default_tensorboard --host localhost
log_dir = "/tmp/gym/{}".format(int(time.time()))
os.makedirs(log_dir, exist_ok=True)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
TEST_COUNT = 100
pendulum_env = gym.make('Pendulum-v0')
pendulum_env = Monitor(pendulum_env, log_dir, allow_early_resets=True)
check_env(pendulum_env, warn=True)
model = PPO2(n_envs=8, n_timesteps=2e6, policy='MlpPolicy', n_steps=2048, nminibatches=32, lam=0.95, gamma=0.99,
noptepochs=10, ent_coef=0.0, learning_rate=3e-4,
cliprange=0.2, env=pendulum_env, verbose=1, tensorboard_log="./ppo2_pendulum_default_tensorboard/")
model.learn(total_timesteps=100_000, log_interval=10)
model.save("ppo2_pendulum_default")
The error:
Traceback (most recent call last):
File "C:/Users/meric/OneDrive/Masaüstü/TUM/Thesis/Pycharm/pioneer/pendulum_default_PPO2.py", line 35, in <module>
cliprange=0.2, env=pendulum_env, verbose=1, tensorboard_log="./ppo2_pendulum_default_tensorboard/")
TypeError: __init__() got an unexpected keyword argument 'n_envs'
How can I use wrappers while training an agent with the usage of Zoo?
I tried to copy the code of the Monitor wrapper in the wrappers.py file and added the following line to the ppo2.yml file:
env_wrapper: utils.wrappers.Monitor
Traceback (most recent call last):
File "train.py", line 210, in <module>
env_wrapper = get_wrapper_class(hyperparams)
File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\utils.py", line 130, in get_wrapper_class
wrapper_module = importlib.import_module(get_module_name(wrapper_name))
File "C:\Users\meric\Anaconda3\envs\pioneer\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\wrappers.py", line 78, in <module>
class Monitor(gym.Wrapper):
File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\wrappers.py", line 96, in Monitor
info_keywords=()): NameError: name 'Optional' is not defined
I also tried, without adding anything to the wrappers.py file, adding the following line to the ppo2.yml file:
env_wrapper: stable_baselines.bench.monitor
Traceback (most recent call last):
File "train.py", line 284, in <module>
env = create_env(n_envs)
File "train.py", line 257, in create_env
env = DummyVecEnv([make_env(env_id, 0, args.seed, wrapper_class=env_wrapper, log_dir=log_dir, env_kwargs=env_kwargs)])
File "C:\Users\meric\Anaconda3\envs\pioneer\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line
23, in __init__ self.envs = [fn() for fn in env_fns]
File "C:\Users\meric\Anaconda3\envs\pioneer\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line
23, in <listcomp> self.envs = [fn() for fn in env_fns]
File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\utils.py", line 175, in _init
env = wrapper_class(env)
File "C:\Users\meric\OneDrive\Masaüstü\TUM\Thesis\Zoo\rl-baselines-zoo\utils\utils.py", line 141, in wrap_env
env = wrapper_class(env, **kwargs) TypeError: 'module' object is not callable
System Info
Describe the characteristic of your environment:
Describe how stable baselines was installed (pip, docker, source, ...): pip install stable-baselines[mpi]
GPU models and configuration: NVIDIA GTX 1050
Python version: 3.7
Tensorflow version: 1.15
Additional context
Yesterday I created two other issues #94 and #95, they were both closed because I could not explain my issues properly and also did not manage to properly follow the template. I deeply apologize for my amateur behaviour. I just started using these issue templates and the whole concept is new to me. I am working on an important project and therefore it is very crucial for me to solve these issues, hence the bombard of questions in both forums rl-baselines-zoo & stable-baselines. Thank you very much for your answers so far and for the great documentation, it helps a lot.
I am working on the Pendulum-v0 environment with the algorithm ppo2. I already managed to get the algorithm running with the suggested hyperparameters from the RL Baselines Zoo and achieved results. However, I did not manage to get a good looking graph, same issue as here I suppose. This problem did not occur when I changed the number of environments to 1, but yet it is not solved and I could not manage to wrap the environment with the Monitor wrapper. Furthermore, I am trying to use on top of the Monitor wrapper, a custom wrapper I made to change observations to images with the help of env.render("rgb_array") and the FrameStack wrapper to stack those frames. I think I could solve these issues if I did not need to use Zoo for the hyperparameters but rather just worked on my original code with the said hyperparameters. I suppose an answer to one of these two questions (preferably the first one) would solve my issues:
The code:
The error:
I tried to copy the code of the Monitor wrapper in the wrappers.py file and added the following line to the ppo2.yml file:
env_wrapper: utils.wrappers.Monitor
I also tried, without adding anything to the wrappers.py file, adding the following line to the ppo2.yml file:
env_wrapper: stable_baselines.bench.monitor
System Info Describe the characteristic of your environment:
Additional context Yesterday I created two other issues #94 and #95, they were both closed because I could not explain my issues properly and also did not manage to properly follow the template. I deeply apologize for my amateur behaviour. I just started using these issue templates and the whole concept is new to me. I am working on an important project and therefore it is very crucial for me to solve these issues, hence the bombard of questions in both forums rl-baselines-zoo & stable-baselines. Thank you very much for your answers so far and for the great documentation, it helps a lot.