ARISE-Initiative / robosuite

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
https://robosuite.ai
Other
1.24k stars 397 forks source link

[Question] How to integrate robosuite envs with baseline RL algorithms? #131

Closed hermanjakobsen closed 3 years ago

hermanjakobsen commented 3 years ago

Hey guys!

I would like to test out RL-algorithms on my robosuite environment and the baseline repo from OpenAI offers a collection of RL algorithms.

Do you have any experience with integrating robosuite environments with RL algorithms in the baseline repo? Phrased differently, do you have any tips on how to run the baseline algorithms with my own custom robosuite environment?

I have tried the following with my custom FetchPush environment

from my_environments import FetchPush
from robosuite.wrappers import GymWrapper
from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
from baselines.ppo2.ppo2 import learn

env_robo = GymWrapper(
    suite.make(
        'FetchPush',
        robots='Panda',
        controller_configs=None,
        gripper_types='UltrasoundProbeGripper',
        has_renderer = True,
        has_offscreen_renderer= False,
        use_camera_obs=False,
        use_object_obs=True,
        control_freq = 50,
        render_camera = None,
    )
)

env = DummyVecEnv([env_robo])
network='mlp'
seed = None

model = learn(network=network, env=env, seed=seed, total_timesteps=2e5)

I also had to add a call method for the GymWrapper

def __call__(self): 
        return GymWrapper(self.env)

When I run the code, the RL algorithm initially outputs

------------------------------------------
| eplenmean               | nan          |
| eprewmean               | nan          |
| fps                     | 133          |
| loss/approxkl           | 0.005290336  |
| loss/clipfrac           | 0.06945801   |
| loss/policy_entropy     | 9.930621     |
| loss/policy_loss        | -0.009111974 |
| loss/value_loss         | 0.0015881677 |
| misc/explained_variance | 0.335        |
| misc/nupdates           | 1            |
| misc/serial_timesteps   | 2048         |
| misc/time_elapsed       | 15.4         |
| misc/total_timesteps    | 2048         |
------------------------------------------

Then, the script just starts to repeatedly create and close glfw windows. Any help, thoughts or experiences regarding this topic would be greatly appreciated! :)

hermanjakobsen commented 3 years ago

Intuitively, the problem with spamming glfw windows was solved by setting has_renderer = True when creating the environment.

hermanjakobsen commented 3 years ago

Hi again, guys!

I am having a hard time integrating the environment with the baseline algorithms. Do you have any experience with this? And if so, would you care to share a pipeline on how to train and run the environment with the algorithms?

hermanjakobsen commented 3 years ago

For future references, if someone needs a training and testing pipeline for a custom-made environment

import robosuite as suite
import gym
import numpy as np

from robosuite.environments.base import register_env
from robosuite import load_controller_config
from robosuite.wrappers import GymWrapper

from stable_baselines3 import PPO
from stable_baselines3.common.save_util import save_to_zip_file, load_from_zip_file
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

from my_environments import FetchPush

register_env(FetchPush)

# Training
env = GymWrapper(
        suite.make(
            'FetchPush',
            robots='UR5e',
            controller_configs=controller_config,
            gripper_types=None,
            has_renderer = False,
            has_offscreen_renderer= False,
            use_camera_obs=False,
            use_object_obs=True,
            control_freq = 50,
            render_camera = None,
            horizon = 2000,
            reward_shaping = True,
        )
    )

env = wrap_env(env)
filename = 'test'

model = PPO('MlpPolicy', env, verbose=1, tensorboard_log='./ppo_fetchpush_tensorboard/')
model.learn(total_timesteps=3e5, tb_log_name=filename)

model.save('trained_models/' + filename)
env.save('trained_models/vec_normalize_' + filename + '.pkl')     # Save VecNormalize statistics

# Testing
'''
Create identical environment with renderer or override render function in environment to something like this

def render(self, mode=None):
    super().render()
'''
env_robo = GymWrapper(  
        suite.make(
            'FetchPush',
            robots='UR5e',
            controller_configs=controller_config,
            gripper_types=None,
            has_renderer = True,
            has_offscreen_renderer= False,
            use_camera_obs=False,
            use_object_obs=True,
            control_freq = 50,
            render_camera = None,
            horizon = 2000,
            reward_shaping = True
        )
    )

# Load model
model = PPO.load('trained_models/' + filename)
# Load the saved statistics
env = DummyVecEnv([lambda : env_robo])
env = VecNormalize.load('trained_models/vec_normalize_' + filename + '.pkl', env)
#  do not update them at test time
env.training = False
# reward normalization
env.norm_reward = False

obs = env.reset()

while True:
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)

    env_robo.render()
    if done:
        obs = env.reset()

env.close()
cremebrule commented 3 years ago

Hi @hermanjakobsen ,

Apologies for not getting back to this sooner, and glad you were able to get this solved on your own -- would you like to join our slack robosuite workspace? As a deep user, you'd be a great addition to the community and also be able to get a lot quicker feedback for any issues that arise (I check the workspace more frequently than our main issues page).

If you're interested, please check out our contribution guidelines for more info and the slack link.

fnuabhimanyu commented 3 years ago

Thanks for the sample code for integrating robosuite and OpenAI baselines. I have few question regarding the implementation,

  1. env = wrap_env(env) where do you define wrap_env()?
  2. Do I keep has_renderer = True for training env too to avoid spawning multiple glfw windows?
hermanjakobsen commented 3 years ago

Hi @Abhimanyu8713 ,

An updated script which utilizes multiprocessing for training is available here.

To answer your questions: 1) wrap_env(env) was just a simple function for wrapping the environment in the necessary wrappers. I think it was implemented as

def wrap_env(env):
    wrapped_env = Monitor(env)                          # Needed for extracting eprewmean and eplenmean
    wrapped_env = DummyVecEnv([lambda : wrapped_env])   # Needed for all environments (e.g. used for mulit-processing)
    wrapped_env = VecNormalize(wrapped_env)             # Needed for improving training when using MuJoCo envs?
    return wrapped_env

Sorry for not including this in the code above :)

2) I sat has_renderer = False when training to avoid the spawning glfw windows, as you proposed. I then turned on rendering when testing the policy,

fnuabhimanyu commented 3 years ago

Thanks for the quick response as well as the script. This is exactly I was looking for.