DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.96k stars 1.68k forks source link

Changing Exploration Timesteps #1022

Closed anilkurkcu closed 2 years ago

anilkurkcu commented 2 years ago

How could I determine the number of exploration steps for a certain algorithm? I guess there should be a default number of timesteps for the random exploration phase, and what I would like to do is to increase this timestep amount.

araffin commented 2 years ago

Hello, what you are looking for is called learning_starts in the doc (for SAC/TD3/DDPG/TQC/DQN). DQN has additional paratemers for the epsilon-greedy exploration, best is to look at the doc in that case.

anilkurkcu commented 2 years ago

Thank you for your answer. How about for A2C? I could not come across something similar to this in the docs.

araffin commented 2 years ago

How about for A2C? I could not come across something similar to this in the docs.

You should read more about A2C (we have some links in the doc). A2C is on-policy, it must use its current policy to collect the data, so it cannot have a purely random exploration phase (and it doesn't have a replay buffer).

anilkurkcu commented 2 years ago

I see. Then how does this algorithm decide upon the exploration/exploitation tradeoff?