DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.09k stars 1.7k forks source link

[Bug] DDPG+HER saving and loading model: loaded model performs as if it was completely untrained. #533

Closed sph001 closed 3 years ago

sph001 commented 3 years ago

🐛 Bug

If I train a model with DDPG + HER (200,000 steps) and evaluate it over 1000 iterations, I receive a success rate of ~95%. If I then save that model, load it into a fresh instance of that same model, and run the exact same evaluation, it has a 0% success rate.

I cant find anything in my environment which might interfere with the model, but the loaded model's replay buffer is empty which suggests that the replay buffer is not being saved with the model.

To Reproduce

def evaluate(_model: DDPG, name: str, _dir: str, runs: int):
    # removed large amount of code which does passive recording of the evaluation, but does not change anything.
    mean, std = evaluate_policy(_model, _model.env, runs, True, True, record)

_model = build_model(_env_name, _save_dir)
_model.load(os.path.join(_save_dir, _name + ".zip"), _model.env)
evaluate(_model, _name, _save_dir, 1000)`

Expected behavior

I expect that if I save a model, I can call load and restore the model in the exact state it was in when I saved it.

 System Info

python 3.7 SB3 1.1.0 torch 1.8.1+cu111 gym 0.18.3

araffin commented 3 years ago

Probably a duplicate of https://github.com/hill-a/stable-baselines/issues/30#issuecomment-423694592

Anyway, I would highly recommend you to use the RL zoo (cf. doc) to avoid such issue and use TD3, SAC or TQC, which usually perform better than DDPG.