Viewing results. - Githubissues

yngtodd commented 5 years ago

Thanks for putting together this library!

I have installed the library on a headless server, along with MongoDB and Visdom. Is there a way to view the results after running from the .yaml configs?

I am testing it out with the example

vel examples-configs/rl/atari/a2c/breakout_a2c.yaml train

Everything trains fine, but then when I look at the logfile at vel/output/openai/breakout_a2c/0/log.txt, it only saves the following:

Logging to /home/ygx/src/vel/output/openai/breakout_a2c/0

And the progress.csv at that directory level is empty.

When looking at the yaml config at https://github.com/yngtodd/vel/blob/master/examples-configs/rl/atari/a2c/breakout_a2c.yaml#L57, I see that it is saving a video. Where is that stored?

Thanks!

MillionIntegrals commented 5 years ago

Hi,

Thank you for using it, and asking questions ;) There isn't really any documentation yet, and probably will not be for some time but I'm very happy to answer any questions.

There are multiple ways in which you can access the output. During training, all metrics are stored and recorded in MongoDB (for long-term storage) and in visdom (for visualization). The way how I work with it, is that I use Visdom to visually inspect if training is doing what it should and then potentially load results with Python from MongoDB to process them for various reasons.

That's how the example Visdom chart looks like: newplot 2

Metrics stored in the DB:

> db.metrics.find({'model_name': 'breakout_a2c'})
{ "_id" : ObjectId("5bc5310155cd970dcb4f25f5"), "epoch_idx" : 0, "frames" : 8000, "fps" : 1489, "PMM:episode_rewards" : 1.878787878787879, "P09:episode_rewards" : 3.8000000000000007, "P01:episode_rewards" : 0, "episode_length" : 197.0909090909091, "value_loss" : 0.02023520267427557, "policy_entropy" : 1.3809912633895873, "policy_loss" : 0.006739899950334802, "grad_norm" : 0.09930057158654704, "advantage_norm" : 1.4973756862804293, "explained_variance" : -0.25901189506053923, "model_name" : "breakout_a2c", "run_name" : "breakout_a2c/1" }
{ "_id" : ObjectId("5bcc22ee55cd974f28595b33"), "epoch_idx" : 1, "frames" : 6000, "fps" : 1391, "PMM:episode_rewards" : 1.2142857142857142, "P09:episode_rewards" : 3, "P01:episode_rewards" : 0, "episode_length" : 178.75, "value_loss" : 0.01603925398577303, "policy_entropy" : 1.3837194800376893, "policy_loss" : 0.0020585690875304864, "grad_norm" : 0.12393458011152558, "advantage_norm" : 0.9929843093454838, "explained_variance" : -1.671451603770256, "model_name" : "breakout_a2c", "run_name" : "breakout_a2c/2" }
{ "_id" : ObjectId("5bcc22f355cd974f28595b34"), "epoch_idx" : 2, "frames" : 12000, "fps" : 1311, "PMM:episode_rewards" : 1.2, "P09:episode_rewards" : 3, "P01:episode_rewards" : 0, "episode_length" : 178.06666666666666, "value_loss" : 0.018758475972190353, "policy_entropy" : 1.3858475637435914, "policy_loss" : 0.005217389368335716, "grad_norm" : 0.0655856565352905, "advantage_norm" : 1.2349432955682278, "explained_variance" : 0.013082603216171265, "model_name" : "breakout_a2c", "run_name" : "breakout_a2c/2" }

I have also implemented to output metrics what I call in the "OpenAI" format, that is used in their baselines repository, but it is off by default - That's why you see output/openai directory empty. To enable that, two changes need to be done in the config file:

Set logging to true in the command definition will log metrics to the CSV file vel/output/openai/breakout_a2c/0/progress.csv

commands:
  train:
    name: vel.rl.commands.rl_train_command
    total_frames: 1.0e6
    batches_per_epoch: 20

    openai_logging: true

If you turn on monitor in the environment, some information about environment episodes will be written to the file vel/output/openai/breakout_a2c/0/0.monitor.csv etc:

env:
  name: vel.rl.env.classic_atari
  game: 'BreakoutNoFrameskip-v4'
  env_settings:
    default:
      monitor: true

When you run the train command, the command vel.rl.commands.rl_train_command will run, which is a generic reinforcement learning training loop. What it does it trains the model on an environment rollouts, stores the metrics in a set of locations discussed above and stores model checkpoints in output/checkpoints directory.

To access models that were checkpointed to e.g. record video I've used different commands. So for example if you change your command line to

vel examples-configs/rl/atari/a2c/breakout_a2c.yaml record

The record command will be invoked which is configured in this bit of config file:

  record:
    name: vel.rl.commands.record_movie_command
    takes: 10
    videoname: 'breakout_vid_{:04}.avi'
    frame_history: 4
    sample_args:
      argmax_sampling: true

That will run saved model on 10 environment episodes and store the videos from these runs into output/videos directory.

There are also some other commands defined, like evaluate:

  evaluate:
    name: vel.rl.commands.evaluate_env_command
    takes: 100
    frame_history: 4
    sample_args:
      argmax_sampling: true

which does run model checkpoint on an environment 100 times to calculate some statistics over rewards/episode lengths to evaluate models to each other.

There is also a visdom command, which basically copies saved down metrics from MongoDB to visdom, as any data in visdom is lost on restart, so I needed a way to repopulate that data to view it:

    visdom:
        name: vel.commands.vis_store_command

I hope I've answered your question, feel free to let me know if anything is unclear ;) I just realized one of my refactorings broke movie recording in the latest master commit, I'll get that fixed soon.

yngtodd commented 5 years ago

That is great! Thank you for being so thorough. I'll keep working my way around the library, thanks for taking the time to answer my questions!

yngtodd commented 5 years ago

I think I might be missing something when it comes to evaluating the trained model. After updating my yaml config based on your notes above, I am running into this guy:

🎃 vel [yaml] 🦇  vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate
<ModelConfig at examples-configs/rl/atari/a2c/breakout_a2c.yaml>
================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 0 -- command evaluate -- device cuda
CUDA Device name Quadro P620
2018/10/31 - 16:00:28
================================================================================
Traceback (most recent call last):
  File "/home/ygx/dev/kassa/anaconda3/bin/vel", line 11, in <module>
    load_entry_point('vel', 'console_scripts', 'vel')()
  File "/home/ygx/dev/nykyinen/vel/vel/launcher.py", line 41, in main
    model_config.run_command(args.command, args.varargs)
  File "/home/ygx/dev/nykyinen/vel/vel/api/model_config.py", line 120, in run_command
    return command_descriptor.run(*varargs)
  File "/home/ygx/dev/nykyinen/vel/vel/rl/commands/evaluate_env_command.py", line 32, in run
    self.storage.resume_learning(model)
AttributeError: 'ClassicStorage' object has no attribute 'resume_learning'

Here is the yaml file I am running from: https://github.com/yngtodd/vel/blob/yaml/examples-configs/rl/atari/a2c/breakout_a2c.yaml

Looks like resume_learning is being called from here: https://github.com/MillionIntegrals/vel/blob/99c77ba0def80ed6c45473b1e8db77731e2adfbc/vel/rl/commands/evaluate_env_command.py#L32. Is there something that I need to change for the storage option?

MillionIntegrals commented 5 years ago

My bad, I was refactoring that part of functionality and didn't update code of the commands. Should be good now if you check out the latest commit.

yngtodd commented 5 years ago

Great, that did it! So that looks like it evaluates the trained policy 100 times. Is it possible to save those statistics to plot the mean and variance of rewards over the time steps?

MillionIntegrals commented 5 years ago

Yes, that's entirely possible but that requires to rework a bit evaluation command. Currently in the file evaluate_env_command.py I roll out environments and gather only the final reward:

    def record_take(self, model, env_instance, device, takenumber):
        frames = []

        observation = env_instance.reset()

        frames.append(env_instance.render('rgb_array'))

        print("Evaluating environment...")

        while True:
            observation_array = np.expand_dims(np.array(observation), axis=0)
            observation_tensor = torch.from_numpy(observation_array).to(device)
            actions = model.step(observation_tensor, **self.sample_args)['actions']

            observation, reward, done, epinfo = env_instance.step(actions.item())

            frames.append(env_instance.render('rgb_array'))

            if 'episode' in epinfo:
                # End of an episode
                return epinfo['episode']

You'd need to change the logic to gather rewards of each step, aggregate and then plot. To get more meaningful results you probably would also like to disable reward clipping when env is created which is used for training.

yngtodd commented 5 years ago

I have been looking a bit more at the evaluation phase. I have a trained a model using the breakout_a2c.yaml, and everything looks good:

newplot

But when I run vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate I am getting a constant reward of 0 at every frame:

🦃  vel [evaluate] 🍂  vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate
<ModelConfig at examples-configs/rl/atari/a2c/breakout_a2c.yaml>
================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 0 -- command evaluate -- device cuda
CUDA Device name Quadro P620
2018/11/04 - 14:50:59
================================================================================
WARN: <class 'vel.openai.baselines.common.atari_wrappers.FireEpisodicLifeEnv'> doesn't implement 'reset' method, which is required for wrappers derived directly from Wrapper. Deprecated default implementation is used.
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 24.866142}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 46.829021}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 68.448406}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 90.048011}
Evaluating environment...
Epinfo[episode] {'r': 0.0, 'l': 10001, 't': 111.496355}
       lengths  rewards
count      5.0      5.0
mean   10001.0      0.0
std        0.0      0.0
min    10001.0      0.0
25%    10001.0      0.0
50%    10001.0      0.0
75%    10001.0      0.0
max    10001.0      0.0
================================================================================
Done.
2018/11/04 - 14:52:51
================================================================================

MillionIntegrals commented 5 years ago

Hmm.. It works for me.

One thing that possibly could cause you problems is if after you trained your model, you started training again and cancelled it straight afterwards, which would delete previously saved weights. Other than that, it's hard for me to say what could be the problem.

================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 5 -- command evaluate -- device cuda
CUDA Device name GeForce GTX 1080 Ti
2018/11/08 - 21:11:36
================================================================================
Evaluating environment...
Evaluating environment...
       lengths     rewards
count      2.0    2.000000
mean   10001.0  411.000000
std        0.0   14.142136
min    10001.0  401.000000
25%    10001.0  406.000000
50%    10001.0  411.000000
75%    10001.0  416.000000
max    10001.0  421.000000
================================================================================
Done.
2018/11/08 - 21:12:11
================================================================================

yngtodd commented 5 years ago

Hmm... Maybe I did that. I will retrain and update whether or not I am an idiot. haha

yngtodd commented 5 years ago

This is really odd. I was careful to retrain the model and immediately evaluate it thereafter. Still, I am getting zero reward across evaluations:

Evaluating environment...
Evaluating environment...
       lengths  rewards
count    100.0    100.0
mean   10001.0      0.0
std        0.0      0.0
min    10001.0      0.0
25%    10001.0      0.0
50%    10001.0      0.0
75%    10001.0      0.0
max    10001.0      0.0
================================================================================
Done.
2018/11/11 - 10:43:04
================================================================================

Is there a way you would recommending reloading the model weights and environment from a Python script? Maybe I can dig a bit deeper without using the yaml file.

MillionIntegrals commented 5 years ago

Let's try to get to the bottom of that ;) First question, after your model is trained can you find the file output/checkpoints/breakout_a2c/0/checkpoint_00000500.data?

The number 500 should be probably quite different, I don't remember for how many epochs does this particular configuration but I guess something around 1300. I guess it may be easier for us to move to the gitter chat I've just set up: https://gitter.im/deep-learning-vel/Lobby

yngtodd commented 5 years ago

I do have saved checkpoints in that directory:

🔥 0 [master] 🍂  pwd
/home/ygx/dev/nykyinen/vel/output/checkpoints/breakout_a2c/0
🔥 0 [master] 🍂  ls
checkpoint_00001375.data  checkpoint_hidden_00001375.data

yngtodd commented 5 years ago

Just ran the most recent changes, and it works like a dream!

🔥 vel [master] 🍂  vel examples-configs/rl/atari/a2c/breakout_a2c.yaml evaluate
<ModelConfig at examples-configs/rl/atari/a2c/breakout_a2c.yaml>
================================================================================
Pytorch version: 0.4.1 cuda version 9.0.176 cudnn version 7102
Running model breakout_a2c, run 0 -- command evaluate -- device cuda
CUDA Device name Quadro P620
2018/11/13 - 16:52:51
================================================================================
Storage: <vel.storage.classic.ClassicStorage object at 0x7f025c08b4e0>
Evaluating environment...
Evaluating environment...
Evaluating environment...
Evaluating environment...
Evaluating environment...
       lengths     rewards
count      5.0    5.000000
mean   10001.0  399.000000
std        0.0   34.467376
min    10001.0  341.000000
25%    10001.0  399.000000
50%    10001.0  405.000000
75%    10001.0  425.000000
max    10001.0  425.000000
================================================================================
Done.
2018/11/13 - 16:54:40
================================================================================

Thank you for taking the time to help me out!

MillionIntegrals / vel

Viewing results. #25