[Question] Batch Size Selection for a Finite MDP

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

https://stable-baselines3.readthedocs.io

MIT License

9.22k stars 1.71k forks source link

[Question] Batch Size Selection for a Finite MDP #2024

Open DavidLudl opened 1 month ago

DavidLudl commented 1 month ago

❓ Question

Hello.

I would like to ask if I have a finite MDP, where each episode has a same fixed timestep \$T$. Then during the training, do I have to choose batch size with \$n\times T$\? Or any other batch size is also OK?

Thank you for your time,

Best regards,

Checklist

[x] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 4 weeks ago

Then during the training, do I have to choose batch size with n × T ? Or any other batch size is also OK?

What algorithm are you using? I guess you are talking about on policy algorithm (A2C/PPO) and the n steps parameter?

In that case, you can use any number of steps (this parameter can impact performance though).

DavidLudl commented 4 weeks ago

I am using PPO. They parameter I want to ask is the batch_size. Should I set the batch_size (default 64) in PPO algorithm to n*T ?

araffin commented 4 weeks ago

The mini batch size can be kept as is, it is only about the gradient step.

DavidLudl commented 4 weeks ago

Thank you, now I understand.