Closed iandanforth closed 4 years ago
For reference in the meantime what I've been doing is reducing n_timesteps
(in ppo2.yml in this case) and then running an outer loop script like this.
from time import sleep
from subprocess import Popen, PIPE
cmd = ["python", "train.py", "--env", "MuscledAnt-v0", "-i", "logs\\ppo2\\MuscledAnt-v0.pkl"]
def run(command):
process = Popen(command, stdout=PIPE, shell=True)
while True:
line = process.stdout.readline().rstrip()
if not line:
print("Process Done.")
break
print(line)
if __name__ == "__main__":
while True:
run(cmd)
sleep(1.0)
If the training process dies then at least I will have had a periodically updated checkpoint to which I can return.
This ticket is to consider and discuss adding a mechanism by which intermediate checkpoints are produced during training.
I'm totally for that feature. I would recommend using callback for that.
This will be addressed in https://github.com/hill-a/stable-baselines/issues/348 normally
The callbacks are there, I will add that feature soon.
I've been using train.py as an entry point for testing new agents and environments.
These agents, environments, and associated code are not neccesarily stable. Either in the sense of the physics simulation or in terms of code quality.
Thus is it is not surprising that when training for millions of steps there will be mishaps for example:
Currently train.py saves the state of the agent after the specified training run completes.
This ticket is to consider and discuss adding a mechanism by which intermediate checkpoints are produced during training.