Closed sytelus closed 4 years ago
Hello,
At a minimum, there should be information on a number of training steps
You have that in the hyperparameters file and the config file associated with each trained agent (at least starting with release 1.0 of the zoo). The final performance can be found in benchmark.md
, note the results correspond only to one seed (it is not meant for quantitative comparison).
Yes, training curve would be a good addition, even better learning curve using a test env periodically (it is planned to be supported with the callback collection), but you would at least 10 runs per algorithm per environment.
it seems monitor.csv
Monitor.csv can give you the training learning curve, which is only a proxy to the real performance.
Furthermore, these files are not produced at all currently if you run the experiment.
If you don't specify a log folder, nothing is produced, yes.
I think my comment is probably misunderstood. I'm currently trying to train model for Breakout and reproduce the results. There is nothing in this repo that tells me what I should expect and how do I know the training was successful. As it happens, something is possibly broken in OpenAI baselines as well as stable-baselines so the training for Breakout isn't generating graphs that are convincingly converging.
Also, looks like in current codebase, there is no call to logger.configure()
at all made when running training.py. This possibly explains why there are no monitor.csv and progress.csv generated even when log directory is specified.
There should be a way to see your results that tells you what one should expect if you run the training from scratch. At a minimum, there should be information on a number of training steps and eventual 100-episode average that one might expect in baseline but much better would be to show the entire training curve. Without this baseline is not very meaningful as one may never know if they actually replicated the expected result.
Few good RL baseline frameworks do this, for example here is how other framework display their results: Garage, RLLib, Coach. I love the UX that Garage provides as well as Coach's approach of making results as part of repo itself.
Currently, there is benchmark.zip file in the repo but it seems monitor.csv and progess.csv are not helpful (for example, for DQN progress.csv is empty and monitor.csv only has last few rows). Furthermore, these files are not produced at all currently if you run the experiment.