Closed yngtodd closed 5 years ago
Hi,
Thank you for using it, and asking questions ;) There isn't really any documentation yet, and probably will not be for some time but I'm very happy to answer any questions.
There are multiple ways in which you can access the output. During training, all metrics are stored and recorded in MongoDB (for long-term storage) and in visdom (for visualization). The way how I work with it, is that I use Visdom to visually inspect if training is doing what it should and then potentially load results with Python from MongoDB to process them for various reasons.
That's how the example Visdom chart looks like:
Metrics stored in the DB:
> db.metrics.find({'model_name': 'breakout_a2c'})
{ "_id" : ObjectId("5bc5310155cd970dcb4f25f5"), "epoch_idx" : 0, "frames" : 8000, "fps" : 1489, "PMM:episode_rewards" : 1.878787878787879, "P09:episode_rewards" : 3.8000000000000007, "P01:episode_rewards" : 0, "episode_length" : 197.0909090909091, "value_loss" : 0.02023520267427557, "policy_entropy" : 1.3809912633895873, "policy_loss" : 0.006739899950334802, "grad_norm" : 0.09930057158654704, "advantage_norm" : 1.4973756862804293, "explained_variance" : -0.25901189506053923, "model_name" : "breakout_a2c", "run_name" : "breakout_a2c/1" }
{ "_id" : ObjectId("5bcc22ee55cd974f28595b33"), "epoch_idx" : 1, "frames" : 6000, "fps" : 1391, "PMM:episode_rewards" : 1.2142857142857142, "P09:episode_rewards" : 3, "P01:episode_rewards" : 0, "episode_length" : 178.75, "value_loss" : 0.01603925398577303, "policy_entropy" : 1.3837194800376893, "policy_loss" : 0.0020585690875304864, "grad_norm" : 0.12393458011152558, "advantage_norm" : 0.9929843093454838, "explained_variance" : -1.671451603770256, "model_name" : "breakout_a2c", "run_name" : "breakout_a2c/2" }
{ "_id" : ObjectId("5bcc22f355cd974f28595b34"), "epoch_idx" : 2, "frames" : 12000, "fps" : 1311, "PMM:episode_rewards" : 1.2, "P09:episode_rewards" : 3, "P01:episode_rewards" : 0, "episode_length" : 178.06666666666666, "value_loss" : 0.018758475972190353, "policy_entropy" : 1.3858475637435914, "policy_loss" : 0.005217389368335716, "grad_norm" : 0.0655856565352905, "advantage_norm" : 1.2349432955682278, "explained_variance" : 0.013082603216171265, "model_name" : "breakout_a2c", "run_name" : "breakout_a2c/2" }
I have also implemented to output metrics what I call in the "OpenAI" format, that is used in their baselines repository, but it is off by default - That's why you see output/openai directory empty. To enable that, two changes need to be done in the config file:
Set logging to true in the command definition will log metrics to the CSV file vel/output/openai/breakout_a2c/0/progress.csv
commands:
train:
name: vel.rl.commands.rl_train_command
total_frames: 1.0e6
batches_per_epoch: 20
openai_logging: true
If you turn on monitor
in the environment, some information about environment episodes will be written to the file vel/output/openai/breakout_a2c/0/0.monitor.csv
etc:
env:
name: vel.rl.env.classic_atari
game: 'BreakoutNoFrameskip-v4'
env_settings:
default:
monitor: true
When you run the train command, the command vel.rl.commands.rl_train_command
will run, which is a generic reinforcement learning training loop. What it does it trains the model on an environment rollouts, stores the metrics in a set of locations discussed above and stores model checkpoints in output/checkpoints
directory.
To access models that were checkpointed to e.g. record video I've used different commands. So for example if you change your command line to
vel examples-configs/rl/atari/a2c/breakout_a2c.yaml record
The record command will be invoked which is configured in this bit of config file:
record:
name: vel.rl.commands.record_movie_command
takes: 10
videoname: 'breakout_vid_{:04}.avi'
frame_history: 4
sample_args:
argmax_sampling: true
That will run saved model on 10 environment episodes and store the videos from these runs into output/videos
directory.
There are also some other commands defined, like evaluate:
evaluate:
name: vel.rl.commands.evaluate_env_command
takes: 100
frame_history: 4
sample_args:
argmax_sampling: true
which does run model checkpoint on an environment 100 times to calculate some statistics over rewards/episode lengths to evaluate models to each other.
There is also a visdom command, which basically copies saved down metrics from MongoDB to visdom, as any data in visdom is lost on restart, so I needed a way to repopulate that data to view it:
visdom:
name: vel.commands.vis_store_command
I hope I've answered your question, feel free to let me know if anything is unclear ;) I just realized one of my refactorings broke movie recording in the latest master commit, I'll get that fixed soon.
That is great! Thank you for being so thorough. I'll keep working my way around the library, thanks for taking the time to answer my questions!
I think I might be missing something when it comes to evaluating the trained model. After updating my yaml config based on your notes above, I am running into this guy:
🎃 vel [yaml] 🦇 vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate
<ModelConfig at examples-configs/rl/atari/a2c/breakout_a2c.yaml>
================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 0 -- command evaluate -- device cuda
CUDA Device name Quadro P620
2018/10/31 - 16:00:28
================================================================================
Traceback (most recent call last):
File "/home/ygx/dev/kassa/anaconda3/bin/vel", line 11, in <module>
load_entry_point('vel', 'console_scripts', 'vel')()
File "/home/ygx/dev/nykyinen/vel/vel/launcher.py", line 41, in main
model_config.run_command(args.command, args.varargs)
File "/home/ygx/dev/nykyinen/vel/vel/api/model_config.py", line 120, in run_command
return command_descriptor.run(*varargs)
File "/home/ygx/dev/nykyinen/vel/vel/rl/commands/evaluate_env_command.py", line 32, in run
self.storage.resume_learning(model)
AttributeError: 'ClassicStorage' object has no attribute 'resume_learning'
Here is the yaml file I am running from: https://github.com/yngtodd/vel/blob/yaml/examples-configs/rl/atari/a2c/breakout_a2c.yaml
Looks like resume_learning
is being called from here: https://github.com/MillionIntegrals/vel/blob/99c77ba0def80ed6c45473b1e8db77731e2adfbc/vel/rl/commands/evaluate_env_command.py#L32. Is there something that I need to change for the storage option?
My bad, I was refactoring that part of functionality and didn't update code of the commands. Should be good now if you check out the latest commit.
Great, that did it! So that looks like it evaluates the trained policy 100 times. Is it possible to save those statistics to plot the mean and variance of rewards over the time steps?
Yes, that's entirely possible but that requires to rework a bit evaluation command. Currently in the file evaluate_env_command.py
I roll out environments and gather only the final reward:
def record_take(self, model, env_instance, device, takenumber):
frames = []
observation = env_instance.reset()
frames.append(env_instance.render('rgb_array'))
print("Evaluating environment...")
while True:
observation_array = np.expand_dims(np.array(observation), axis=0)
observation_tensor = torch.from_numpy(observation_array).to(device)
actions = model.step(observation_tensor, **self.sample_args)['actions']
observation, reward, done, epinfo = env_instance.step(actions.item())
frames.append(env_instance.render('rgb_array'))
if 'episode' in epinfo:
# End of an episode
return epinfo['episode']
You'd need to change the logic to gather rewards of each step, aggregate and then plot. To get more meaningful results you probably would also like to disable reward clipping when env is created which is used for training.
I have been looking a bit more at the evaluation phase. I have a trained a model using the breakout_a2c.yaml
, and everything looks good:
But when I run vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate
I am getting a constant reward of 0 at every frame:
🦃 vel [evaluate] 🍂 vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate
<ModelConfig at examples-configs/rl/atari/a2c/breakout_a2c.yaml>
================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 0 -- command evaluate -- device cuda
CUDA Device name Quadro P620
2018/11/04 - 14:50:59
================================================================================
WARN: <class 'vel.openai.baselines.common.atari_wrappers.FireEpisodicLifeEnv'> doesn't implement 'reset' method, which is required for wrappers derived directly from Wrapper. Deprecated default implementation is used.
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 24.866142}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 46.829021}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 68.448406}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 90.048011}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 111.496355}
lengths rewards
count 5.0 5.0
mean 10001.0 0.0
std 0.0 0.0
min 10001.0 0.0
25% 10001.0 0.0
50% 10001.0 0.0
75% 10001.0 0.0
max 10001.0 0.0
================================================================================
Done.
2018/11/04 - 14:52:51
================================================================================
Hmm.. It works for me.
One thing that possibly could cause you problems is if after you trained your model, you started training again and cancelled it straight afterwards, which would delete previously saved weights. Other than that, it's hard for me to say what could be the problem.
================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 5 -- command evaluate -- device cuda
CUDA Device name GeForce GTX 1080 Ti
2018/11/08 - 21:11:36
================================================================================
Evaluating environment...
Evaluating environment...
lengths rewards
count 2.0 2.000000
mean 10001.0 411.000000
std 0.0 14.142136
min 10001.0 401.000000
25% 10001.0 406.000000
50% 10001.0 411.000000
75% 10001.0 416.000000
max 10001.0 421.000000
================================================================================
Done.
2018/11/08 - 21:12:11
================================================================================
Hmm... Maybe I did that. I will retrain and update whether or not I am an idiot. haha
This is really odd. I was careful to retrain the model and immediately evaluate it thereafter. Still, I am getting zero reward across evaluations:
Evaluating environment...
Evaluating environment...
lengths rewards
count 100.0 100.0
mean 10001.0 0.0
std 0.0 0.0
min 10001.0 0.0
25% 10001.0 0.0
50% 10001.0 0.0
75% 10001.0 0.0
max 10001.0 0.0
================================================================================
Done.
2018/11/11 - 10:43:04
================================================================================
Is there a way you would recommending reloading the model weights and environment from a Python script? Maybe I can dig a bit deeper without using the yaml file.
Let's try to get to the bottom of that ;) First question, after your model is trained can you find the file
output/checkpoints/breakout_a2c/0/checkpoint_00000500.data
?
The number 500 should be probably quite different, I don't remember for how many epochs does this particular configuration but I guess something around 1300. I guess it may be easier for us to move to the gitter chat I've just set up: https://gitter.im/deep-learning-vel/Lobby
I do have saved checkpoints in that directory:
🔥 0 [master] 🍂 pwd
/home/ygx/dev/nykyinen/vel/output/checkpoints/breakout_a2c/0
🔥 0 [master] 🍂 ls
checkpoint_00001375.data checkpoint_hidden_00001375.data
Just ran the most recent changes, and it works like a dream!
🔥 vel [master] 🍂 vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate
<ModelConfig at examples-configs/rl/atari/a2c/breakout_a2c.yaml>
================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 0 -- command evaluate -- device cuda
CUDA Device name Quadro P620
2018/11/13 - 16:52:51
================================================================================
Storage: <vel.storage.classic.ClassicStorage object at 0x7f025c08b4e0>
Evaluating environment...
Evaluating environment...
Evaluating environment...
Evaluating environment...
Evaluating environment...
lengths rewards
count 5.0 5.000000
mean 10001.0 399.000000
std 0.0 34.467376
min 10001.0 341.000000
25% 10001.0 399.000000
50% 10001.0 405.000000
75% 10001.0 425.000000
max 10001.0 425.000000
================================================================================
Done.
2018/11/13 - 16:54:40
================================================================================
Thank you for taking the time to help me out!
Thanks for putting together this library!
I have installed the library on a headless server, along with MongoDB and Visdom. Is there a way to view the results after running from the .yaml configs?
I am testing it out with the example
Everything trains fine, but then when I look at the logfile at
vel/output/openai/breakout_a2c/0/log.txt
, it only saves the following:And the
progress.csv
at that directory level is empty.When looking at the yaml config at https://github.com/yngtodd/vel/blob/master/examples-configs/rl/atari/a2c/breakout_a2c.yaml#L57, I see that it is saving a video. Where is that stored?
Thanks!