[Question] Influence of evaluations on model learning

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

MIT License

9.26k stars 1.71k forks source link

Question

Do the evaluations influence the training in the SAC-Agent? From my understanding of the code and the documentation, i would answer the question with no. But my experimental results (described below) seem so.

Additional context

I build a custom environment and tried different settings. Currently I use the Soft Actor Critic Agent. I'm comparing two experiments:

Experiment 1: 500,000 steps training, evaluation every 5,000 steps Experiment 2: 1,000,000 steps training, evaluation every 10,000 steps

Experiment 1 showed a increasing reward curve with good results starting at 450,000 steps, and i wanted to find out how it would continue with more training. My expectation was, that the reward curve would continue increasing or stop growing at some point. But in the second experiment the reward curves start at 900,000 steps to find good results.

The reward curves of both experiments look identical, just with a different scala of steps. It seems, that more training steps did not help, but more often evaluations could help.

DLR-RM / stable-baselines3

[Question] Influence of evaluations on model learning #1054

Question

Additional context