google-research / planet

Learning Latent Dynamics for Planning from Pixels
https://danijar.com/planet
Apache License 2.0
1.18k stars 202 forks source link

RuntimeWarning when running sample command #35

Closed robertmoni closed 5 years ago

robertmoni commented 5 years ago

Hello,

Impressing work! I tried to run your code but got blocked at the beginning.

When I run the sample command python3 -m planet.scripts.train --logdir /path/to/logdir --config default --params '{tasks: [cheetah_run]}' I get the following output:

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'planet.scripts.train' found in sys.modules after import of package 'planet.scripts', but prior to execution of 'planet.scripts.train'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))

The execution ends here. What could be the reason?

piojanu commented 5 years ago

Hi!

Please note that those are "just" warnings. Check if you don't have a problem mentioned in this issue #32:

Hi!

This is "just" a warning which shouldn't suppress further execution (i.e. I see it too and I'm able to train PlaNet just fine). What I suspect is previous run has finished or you've killed the training (with e.g. Ctrl-C) and now you're trying to rerun it. If that's the case please...

Short answer: remove old logs dir and run it fresh. It should work just fine.

Long answer: PlaNet have some logic that is dedicated for running multinode experiments and this logic uses files markers (DONE, PING and couple others) to track progress. If in logs dir there is DONE file, then please delete it and try again. It should run now.

This logic is here in code:

https://github.com/google-research/planet/blob/1896bdb34595c58c13c8fba5a6e085ab3a3aa294/planet/training/running.py#L171-L181

and here are those check functions: https://github.com/google-research/planet/blob/1896bdb34595c58c13c8fba5a6e085ab3a3aa294/planet/training/running.py#L215-L247

Did it help?

robertmoni commented 5 years ago

The only log dir I found is at /src/planet/.git/ directory. I deleted this but got the same result -> no training.

danijar commented 5 years ago

You can safely ignore the warning. Have you changed the logdir in the command to a path that actually exists on your system?

robertmoni commented 5 years ago

It turned out the problem is from mujoco. I was trying to run planet from docker container with mujoco installed into the docker container. Although mujoco works on the host with the 30 day trial license, I receive invalid activation key when I try to run it from the docker.

As posted in this answer, mujoco won't work from docker with a trial license.

danijar commented 5 years ago

Thanks for figuring out the reason :)