A succesfully ppo trained agent demands some steps of re-training to make good predictions

🐛 Bug

A succesfully trained agent demands some steps of re-training to make good predictions

I'm using a custom env. I've trained it until reach 100% of success. In the end of training it is saved it is evaluated by 10 episodes with good results

When I try to load it performs poorly.

I wrote a single script to use the same parameters used for training. After load the model the evaluation is very bad I re-train it for 1000 steps (1 or 100 doesn't work) After re-train it performs good I tried to evaluate other eval_env but the result was unsuccessful

The terminal output after run the script

Relevant log output / Error message

No response

System Info

OS: Linux-6.8.0-40-generic-x86_64-with-glibc2.35 # 40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2
Python: 3.10.12
Stable-Baselines3: 2.3.2
PyTorch: 2.3.1+cu121
GPU Enabled: True
Numpy: 1.21.5
Cloudpickle: 3.0.0
Gymnasium: 0.29.1

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] I have provided a minimal and working example to reproduce the bug
[X] I have checked my env using the env checker
[X] I've used the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3