[Feature request] Checkpointing in train.py

araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.

https://stable-baselines.readthedocs.io/

MIT License

1.13k stars 208 forks source link

[Feature request] Checkpointing in train.py #13

Closed iandanforth closed 4 years ago

iandanforth commented 5 years ago

I've been using train.py as an entry point for testing new agents and environments.

These agents, environments, and associated code are not neccesarily stable. Either in the sense of the physics simulation or in terms of code quality.

Thus is it is not surprising that when training for millions of steps there will be mishaps for example:

An agent becomes physically unstable and the simulator crashes or hangs
An agent enters an undesirable part of policy space and begins behaving erratically
An agent learns to hack the reward function
Some other part of the code crashes or hangs

Currently train.py saves the state of the agent after the specified training run completes.

This ticket is to consider and discuss adding a mechanism by which intermediate checkpoints are produced during training.

iandanforth commented 5 years ago

For reference in the meantime what I've been doing is reducing n_timesteps (in ppo2.yml in this case) and then running an outer loop script like this.

from time import sleep
from subprocess import Popen, PIPE
cmd = ["python", "train.py", "--env", "MuscledAnt-v0", "-i", "logs\\ppo2\\MuscledAnt-v0.pkl"]

def run(command):
    process = Popen(command, stdout=PIPE, shell=True)
    while True:
        line = process.stdout.readline().rstrip()
        if not line:
            print("Process Done.")
            break
        print(line)

if __name__ == "__main__":
    while True:
        run(cmd)
        sleep(1.0)

If the training process dies then at least I will have had a periodically updated checkpoint to which I can return.

araffin commented 5 years ago

This ticket is to consider and discuss adding a mechanism by which intermediate checkpoints are produced during training.

I'm totally for that feature. I would recommend using callback for that.

araffin commented 4 years ago

This will be addressed in https://github.com/hill-a/stable-baselines/issues/348 normally

araffin commented 4 years ago

The callbacks are there, I will add that feature soon.