Closed ghost closed 2 years ago
Hello,
From my understanding of the code and the documentation, i would answer the question with no
yes, it should not. (and it does not for built-in gym env)
But my experimental results (described below) seem so.
it usually the sign that something is wrong with your custom env (see https://github.com/DLR-RM/rl-baselines3-zoo/pull/124#issuecomment-870412632 for instance, fixed by https://github.com/qgallouedec/panda-gym/commit/ec7454a9bbd67f0835f9e9eb083cfc5147a01c39)
Experiment 1 showed a increasing reward curve with good results starting
how many random seeds did you use? (see RL tips and tricks in the doc)
Question
Do the evaluations influence the training in the SAC-Agent? From my understanding of the code and the documentation, i would answer the question with no. But my experimental results (described below) seem so.
Additional context
I build a custom environment and tried different settings. Currently I use the Soft Actor Critic Agent. I'm comparing two experiments:
Experiment 1: 500,000 steps training, evaluation every 5,000 steps Experiment 2: 1,000,000 steps training, evaluation every 10,000 steps
Experiment 1 showed a increasing reward curve with good results starting at 450,000 steps, and i wanted to find out how it would continue with more training. My expectation was, that the reward curve would continue increasing or stop growing at some point. But in the second experiment the reward curves start at 900,000 steps to find good results.
The reward curves of both experiments look identical, just with a different scala of steps. It seems, that more training steps did not help, but more often evaluations could help.