DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.26k stars 1.71k forks source link

[Question] Influence of evaluations on model learning #1054

Closed ghost closed 2 years ago

ghost commented 2 years ago

Question

Do the evaluations influence the training in the SAC-Agent? From my understanding of the code and the documentation, i would answer the question with no. But my experimental results (described below) seem so.

Additional context

I build a custom environment and tried different settings. Currently I use the Soft Actor Critic Agent. I'm comparing two experiments:

Experiment 1: 500,000 steps training, evaluation every 5,000 steps Experiment 2: 1,000,000 steps training, evaluation every 10,000 steps

Experiment 1 showed a increasing reward curve with good results starting at 450,000 steps, and i wanted to find out how it would continue with more training. My expectation was, that the reward curve would continue increasing or stop growing at some point. But in the second experiment the reward curves start at 900,000 steps to find good results.

The reward curves of both experiments look identical, just with a different scala of steps. It seems, that more training steps did not help, but more often evaluations could help.

araffin commented 2 years ago

Hello,

From my understanding of the code and the documentation, i would answer the question with no

yes, it should not. (and it does not for built-in gym env)

But my experimental results (described below) seem so.

it usually the sign that something is wrong with your custom env (see https://github.com/DLR-RM/rl-baselines3-zoo/pull/124#issuecomment-870412632 for instance, fixed by https://github.com/qgallouedec/panda-gym/commit/ec7454a9bbd67f0835f9e9eb083cfc5147a01c39)

Experiment 1 showed a increasing reward curve with good results starting

how many random seeds did you use? (see RL tips and tricks in the doc)