google-research / planet

Learning Latent Dynamics for Planning from Pixels
https://danijar.com/planet
Apache License 2.0
1.18k stars 202 forks source link

Are these -nan values normal? #31

Closed maximecb closed 5 years ago

maximecb commented 5 years ago

After a tedious installation process, I just got the training script to run. I am getting some nan values in my console output:

Epoch 1 phase train (phase step 0, global step 0).
step/score/loss/zs_entropy/zs_divergence =  [0, -nan, 11819.4326, 31.6100903, 2]
Recorded episode 20190426T161038-cd160510f8e54c9f8bc1370f3c771617.
step/score/loss/zs_entropy/zs_divergence =  [5000, -nan, 11647.1592, 33.6354, 2.20895743]
Recorded episode 20190426T161151-bf16a01434a841aa9add8cafaf2b41e6.
step/score/loss/zs_entropy/zs_divergence =  [10000, -nan, 11440.2422, 41.3420258, 2.10529876]
Recorded episode 20190426T161304-879f5f9548074ab293e887c1c0639575.
step/score/loss/zs_entropy/zs_divergence =  [15000, -nan, 11402.9209, 42.6845398, 2.01028967]

Training command used:

python3 -m planet.scripts.train  \
    --logdir logs \
    --config default \
    --params '{tasks: [cheetah_run]}'

I was just wondering if this is normal, or if there's something wrong with my installation.

PS, I am also getting many identical warnings from numpy, which should be easy to fix. Mentioning this because it really pollutes the console output:

/home/maxime/.local/lib/python3.6/site-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)
piojanu commented 5 years ago

Hi!

Please check out this issue #10. Specifically this answer from @danijar:

Nice! The nan is not a problem at all, it just means that there is no score during iterations where no planning happens. The most important summaries at the "cem" scalar summaries that show the planning performance. If any of the "divergence" scalar summaries is at zero the divergence_scale is too high. Besides that, you can look at the "openloop" image summaries to see future frames imagined by the agent and at the "cem" image summaries to see the planning policy in the environment.

You're also welcome to add PR that fixes those warnings 😄 Does it answer you concerns or is there anything more?

maximecb commented 5 years ago

Is there an easy way to visualize the progress of training? I saw that there is code in there to generate gifs. Is that called automatically (if so, where do the gifs end up)? Is there a command-line option I should specify?

danijar commented 5 years ago

Hi @maximecb, thanks for reaching out. As @piojanu pointed out, the nans are expected. The GIFs are generated automatically, as long as ffmpeg is available. They are written as TensorBoard summaries together with all other metrics. Just point a TensorBoard to the log directory.