Tutorial / Demonstration of Custom Training Loop

MushroomRL / mushroom-rl

Python library for Reinforcement Learning.

MIT License

803 stars 145 forks source link

Stable Baselines 3 opens with showing how to train an agent in one of two ways: first with a custom loop, the second with their .train() method

https://stable-baselines3.readthedocs.io/en/master/guide/quickstart.html

Approach 1:

import gym

from stable_baselines3 import A2C

env = gym.make('CartPole-v1')

model = A2C('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

obs = env.reset()
for i in range(1000):
    action, _state = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    env.render()
    if done:
      obs = env.reset()

Approach 2:

from stable_baselines3 import A2C

model = A2C('MlpPolicy', 'CartPole-v1').learn(10000)

I can find mushroom-rl's equivalent of Approach 2, but I can't find the equivalent of Approach 1. Could someone please provide a tutorial or demonstration?

Thank you in advance!

Sorry, but I don't understand your point. If you want to render the environment, without learning, just call:

core.evaluate(n_steps=1000, render=True)

This will run the agent in the environment for 1000 steps and render.

You can make the same for a fixed number of episodes (2 in the examples):

core.evaluate(n_episodes=2, render=True)

If you need to add arbitrary code to the step, just use the callback_step parameter of the core. Given that you can pass a class, you can activate and deactivate your callback behavior from your code. This means that you can do whatever you want...

If you want to implement the interaction loop by yourself, you can just call the env to reset and step functions manually, but I don't see any general use case of doing such a thing and that couldn't be done with a callback or a dummy agent.

MushroomRL / mushroom-rl

Tutorial / Demonstration of Custom Training Loop #85