How to produce results like Figure 4 from logs

google-research / planet

Learning Latent Dynamics for Planning from Pixels

https://danijar.com/planet

Apache License 2.0

1.18k stars 202 forks source link

How to produce results like Figure 4 from logs #36

Closed seuqaj114 closed 5 years ago

seuqaj114 commented 5 years ago

Hey, the models train very well out of the box with this code, but the tensorboard plots do not correspond to Figure 4 (as far as I can tell, they correspond to the evaluation of a single episode every 5000 steps).

I have a few specific questions:

Where can I access or produce the data to plot Figure 4?
How can I see for every summary an average score over N episodes instead of just 1? (for both the training and testing plots in tensorboard).
How can I increase the frequency of the test simulation summary? The default is every 50k steps, but I'd like it to happen with every collection step (or at least run something like 50 controlled simulation episodes after every 5000 steps).

Thanks a lot (I'm trying to use this model as comparative experiment in time for the NeurIPS deadline, so a reply within the next couple days would be greatly appreciated :) )

seuqaj114 commented 5 years ago

I tried changing the line config.sim_summaries[name] = _define_simulation(task, config, params, horizon, 1) to config.sim_summaries[name] = _define_simulation(task, config, params, horizon, 10) in scripts.configs.py and the test scores go down significantly, but they remain low throughout training. I would guess this is because I'm averaging over multiple episodes, but looking at the tensorboard simualtion gif, the agents are actually not performing correctly in this case.

I tried setting log_every=1 instead of log_every=config.test_steps in line 175 of training/utility.py, but that didn't result in lower scores. However, in this case the test results are higher than training (almost max performance at the first test evaluation, whereas training is still quite low).

I'm struggling to make sense of this

piojanu commented 5 years ago

Those are good questions I'd like to know answers for too. Maybe look at this issue: https://github.com/google-research/planet/issues/10#issuecomment-493103249. I've discussed a bit the topic with @danijar.

danijar commented 5 years ago

Thanks for reaching out. The figures using the data shown on TensorBoard. They show the median and percentiles 5 to 95 over 5 random seeds and a window of 10 episodes. It should be easy to replicate this from the CSV you can download from TensorBoard. There is also a script called fetch_events that can extract CSV directly from TensorFlow summary files.

mathkobe commented 5 years ago

Thanks for reaching out. The figures using the data shown on TensorBoard. They show the median and percentiles 5 to 95 over 5 random seeds and a window of 10 episodes. It should be easy to replicate this from the CSV you can download from TensorBoard. There is also a script called fetch_events that can extract CSV directly from TensorFlow summary files.

Hi Danijar:

I have one more questions about the "a window of 10 episodes". What does that mean ? My guess is, you use 5 different random seed. Each random seed run once. You get the mean from those 5 experiments ? Thanks a lot !

danijar commented 5 years ago

It's the mean and variance aggregated over both 5 seeds and 10 consecutive episodes for each of the seeds.