araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 208 forks source link

can I run without early ending? #24

Closed GodZarathustra closed 5 years ago

GodZarathustra commented 5 years ago

I am trying to run ppo2 in mountaincar-v0,and the following two issues may need your help : )

1、 the output in tensorboard seems that every episode can only run within 200 steps, I wonder is there any method to change this maximum number of steps that I want to run per episode(or just do not use the early ending trick)? 2、 the episode reward output in tensorboard seems to be the discounted reward, where should I add real episode reward or episode step length in tensorboard output?

could you give me some advice on above issues? Thans a lot!

araffin commented 5 years ago

Hello,

, I wonder is there any method to change this maximum number of steps, I wonder is there any method to change this maximum number of steps

For that, you need to define your own mountain car env, see here for an example with cartpole, you just need to not wrap it with a max limit wrapper). But then it becomes non standard. The current one is defined here

2、 the episode reward output in tensorboard seems to be the discounted reward, where should I add real episode reward or episode step length in tensorboard output?

It is the normalized one, see hyperparameter file. For the unnormalized one, either look at the terminal, or activate the legacy support of tensorboard: https://stable-baselines.readthedocs.io/en/master/guide/tensorboard.html#legacy-integration

GodZarathustra commented 5 years ago

your answer exactly got my points, thanks a gain for your timely reply :+1: