[question] Define the number of steps per episode during training - Githubissues

araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.

https://stable-baselines.readthedocs.io/

MIT License

1.12k stars 208 forks source link

[question] Define the number of steps per episode during training #63

Closed enajx closed 4 years ago

enajx commented 4 years ago

The class TimeFeatureWrapper has a max_steps parameter controlling the number of steps per episode, is there a way of modifying this parameter when using the training script without modifying it explicitly in the TimeFeatureWrapper class?

araffin commented 4 years ago

is there a way of modifying this parameter when using the training script without modifying it explicitly in the TimeFeatureWrapper class?

For now, no, unless you have a custom environment. In that case, it will use the value from the TimeLimit wrapper.

enajx commented 4 years ago

I see. In the environments that don't use the class TimeFeatureWrapper, the number of steps per episode is defined as the corresponding hyperparameter _nsteps, right?

araffin commented 4 years ago

Not really, TimeFeatureWrapper is made for envs that use the TimeLimit wrapper and thus break the Markov assumption. I am not sure to which n_steps you are refering, maybe ppo, then take a look at the doc for that.

enajx commented 4 years ago

According to the documentation _nsteps is "the number of steps to run for each environment per update". I'm not sure if by update, you meant episode. If not, I can't find where number of steps per episodes used in the train.py script is defined.

araffin commented 4 years ago

update means gradient update here, I recommend you reading more about ppo and policy gradient in general (we have some link in the doc, spinning up guide is a good start) to understand what it means exactly ;)

enajx commented 4 years ago

Will do! :) So the original question: where exactly is defined the number of steps per episode -aka episode length- that is used in the training script?

araffin commented 4 years ago

where exactly is defined the number of steps per episode -aka episode length- that is used in the training script?

steps per episode is defined by the environment not the training script, for instance for CartPole, it is defined here: https://github.com/openai/gym/blob/master/gym/envs/__init__.py#L63

enajx commented 4 years ago

Solved, thank you!