Closed cathywu closed 7 years ago
Related issues:
This suggests that the tensorflow version is different between my local environment and on the cluster.
Local:
tensorflow.__version__ == '0.12.1'
From @dementrock:
Yes. Actually the cluster uses TF 0.11. However I have just built new images that uses the 1.0 version which can be used. To use the new image, edit your config_personal.py file, and update docker image by editing
DOCKER_IMAGE = "dementrock/rllab3:20170417"
.You can test locally that the docker image works by setting the mode to "local_docker" in run_experiment_lite. You will need to install docker first. See https://docs.docker.com/docker-for-mac/ if you are using mac.
Now testing Docker images locally with DOCKER_IMAGE = "dementrock/rllab3:20170417"
.
New issue (using examples/cluster_demo.py
for fewer confounding factors):
docker run -e "AWS_SECRET_ACCESS_KEY=H9y23vX9G6TQnrU1l3SXBIgknwpk6Jk2HSJouP3N" -e "RLLAB_USE_GPU=False" -e "AWS_ACCESS_KEY_ID=AKIAJSJF3B3IYPZSONBQ" -v /Users/cathywu/.mujoco:/root/.mujoco -v /Users/cathywu/Dropbox/PhD/DeepRL-Traffic/rllabcathywu/data/local/first-exp/first_exp_2017_04_18_00_14_52_0001:/tmp/expt -v /Users/cathywu/Dropbox/PhD/DeepRL-Traffic/rllabcathywu:/root/code/rllab -ti dementrock/rllab3:20170417 /bin/bash -c 'echo "Running in docker"; python /root/code/rllab/scripts/run_experiment_lite.py --args_data 'gAJjY2xvdWRwaWNrbGUuY2xvdWRwaWNrbGUKX2ZpbGxfZnVuY3Rpb24KcQAoY2Nsb3VkcGlja2xlLmNsb3VkcGlja2xlCl9tYWtlX3NrZWxfZnVuYwpxAWNjbG91ZHBpY2tsZS5jbG91ZHBpY2tsZQpfYnVpbHRpbl90eXBlCnECWAgAAABDb2RlVHlwZXEDhXEEUnEFKEsBSwBLBUsSS0NjX2NvZGVjcwplbmNvZGUKcQZYigAAAHQAAHQBAMKDAADCgwEAfQEAdAIAZAEAfAEAagMAZAIAZBAAwoMAAn0CAHQEAGQBAHwBAGoDAMKDAAF9AwB0BQBkBAB8AQBkBQB8AgBkBgB8AwBkBwBkCABkCQBkCgBkCwBkDABkDQBkDgBkDwB8AABkDwAZwoMACH0EAHwEAGoGAMKDAAABZAAAU3EHWAYAAABsYXRpbjFxCIZxCVJxCihOWAgAAABlbnZfc3BlY3ELWAwAAABoaWRkZW5fc2l6ZXNxDEsgWAMAAABlbnZxDVgGAAAAcG9saWN5cQ5YCAAAAGJhc2VsaW5lcQ9YCgAAAGJhdGNoX3NpemVxEE2gD1gPAAAAbWF4X3BhdGhfbGVuZ3RocRFLZFgFAAAAbl9pdHJxEksoWAgAAABkaXNjb3VudHETRz/vrhR64UeuWAkAAABzdGVwX3NpemVxFEsgSyCGcRV0cRYoWAkAAABub3JtYWxpemVxF1gLAAAAQ2FydHBvbGVFbnZxGFgRAAAAR2F1c3NpYW5NTFBQb2xpY3lxGVgEAAAAc3BlY3EaWBUAAABMaW5lYXJGZWF0dXJlQmFzZWxpbmVxG1gEAAAAVFJQT3EcWAUAAAB0cmFpbnEddHEeKFgBAAAAdnEfaA1oDmgPWAQAAABhbGdvcSB0cSFYGAAAAGV4YW1wbGVzL2NsdXN0ZXJfZGVtby5weXEiWAgAAABydW5fdGFza3EjSwpoBlgeAAAAAAEPAgYBCQIJAxICBgEGAQYBBgEGAQYBBgEGAQ0EcSRoCIZxJVJxJikpdHEnUnEoXXEpfXEqh3ErUnEsfXEtKGgcY3JsbGFiLmFsZ29zLnRycG8KVFJQTwpxLmgZY3JsbGFiLnBvbGljaWVzLmdhdXNzaWFuX21scF9wb2xpY3kKR2F1c3NpYW5NTFBQb2xpY3kKcS9oGGNybGxhYi5lbnZzLmJveDJkLmNhcnRwb2xlX2VudgpDYXJ0cG9sZUVudgpxMGgbY3JsbGFiLmJhc2VsaW5lcy5saW5lYXJfZmVhdHVyZV9iYXNlbGluZQpMaW5lYXJGZWF0dXJlQmFzZWxpbmUKcTFoF2NybGxhYi5lbnZzLm5vcm1hbGl6ZWRfZW52Ck5vcm1hbGl6ZWRFbnYKcTJ1Tn1xM3RSLg==' --log_dir '/tmp/expt' --variant_data 'gAN9cQAoWAQAAABzZWVkcQFLAVgJAAAAc3RlcF9zaXplcQJHP4R64UeuFHtYCAAAAGV4cF9uYW1lcQNYIgAAAGZpcnN0X2V4cF8yMDE3XzA0XzE4XzAwXzE0XzUyXzAwMDFxBHUu' --seed '1' --exp_name 'first_exp_2017_04_18_00_14_52_0001' --snapshot_mode 'last' --n_parallel '1' --use_cloudpickle 'True'; sleep 120'
Running in docker
> /root/code/rllab/scripts/run_experiment_lite.py(8)<module>()
7 ipdb.set_trace()
----> 8 from rllab.misc.ext import is_iterable, set_seed
9 from rllab.misc.instrument import concretize
ipdb> from rllab.misc.ext import is_iterable, set_seed
*** ImportError: No module named 'rllab.misc'
ipdb> import rllab.misc
*** ImportError: No module named 'rllab.misc'
ipdb> import rllab.rllab.misc
ipdb> import rllab.rllab.misc.ext
*** ImportError: No module named 'rllab.misc'
ipdb> sys.path
sys.path: ['', '/root/code/rllab/scripts', '/root/code/rllab3', '/root/code', '/opt/conda/envs/rllab3/lib/python35.zip', '/opt/conda/envs/rllab3/lib/python3.5', '/opt/conda/envs/rllab3/lib/python3.5/plat-linux', '/opt/conda/envs/rllab3/lib/python3.5/lib-dynload', '/opt/conda/envs/rllab3/lib/python3.5/site-packages', '/opt/conda/envs/rllab3/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg', '/opt/conda/envs/rllab3/lib/python3.5/site-packages/torchvision-0.1.8-py3.5.egg', '.', '/opt/conda/envs/rllab3/lib/python3.5/site-packages/IPython/extensions', '/root/.ipython']
Temporary hack in run_experiment_lite.py
:
# FIXME(cathywu) HACK for missing in path in 20170417 docker build
sys.path.append("/root/code/rllab")
Status: examples/cluster_demo.py
working again, examples/cluster_gym_mujoco_demo.py
not working.
New issue: mujoco not installed properly.
Traceback (most recent call last):
File "/root/code/rllab/scripts/run_experiment_lite.py", line 138, in <module>
run_experiment(sys.argv)
File "/root/code/rllab/scripts/run_experiment_lite.py", line 122, in run_experiment
method_call(variant_data)
File "examples/cluster_gym_mujoco_demo.py", line 26, in run_task
File "/root/code/rllab/rllab/envs/gym_env.py", line 68, in __init__
env = gym.envs.make(env_name)
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/gym/envs/registration.py", line 161, in make
return registry.make(id)
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/gym/envs/registration.py", line 119, in make
env = spec.make()
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/gym/envs/registration.py", line 85, in make
cls = load(self._entry_point)
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/gym/envs/registration.py", line 17, in load
result = entry_point.load(False)
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/__init__.py", line 2258, in load
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/__init__.py", line 2264, in resolve
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/gym/envs/mujoco/__init__.py", line 1, in <module>
from gym.envs.mujoco.mujoco_env import MujocoEnv
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/gym/envs/mujoco/mujoco_env.py", line 11, in <module>
import mujoco_py
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/mujoco_py/__init__.py", line 2, in <module>
init_config()
File "/opt/conda/envs/rllab3/lib/python3.5/site-packages/mujoco_py/config.py", line 37, in init_config
raise error.MujocoDependencyError('Found your MuJoCo license key but not binaries. Please put your binaries into ~/.mujoco/mjpro131 or set MUJOCO_PY_MJPRO_PATH. Follow the instructions on https://github.com/openai/mujoco-py for setup.')
mujoco_py.error.MujocoDependencyError: Found your MuJoCo license key but not binaries. Please put your binaries into ~/.mujoco/mjpro131 or set MUJOCO_PY_MJPRO_PATH. Follow the instructions on https://github.com/openai/mujoco-py for setup.
Resolution: need linux version of mujoco.
Temporary hack in rllab/config.py
:
MUJOCO_KEY_PATH = "/Users/cathywu/Dropbox/PhD/DeepRL-Traffic/mujoco_linux" # for docker / ec2
Status: examples/cluster_demo.py
, examples/cluster_gym_mujoco_demo.py
both working.
Bonus: examples/cluster_walker_tf_comparison.py
also works with mode="local_docker"
.
Resolved by #5.
Not working:
Working:
Also, both versions work locally.
Logs from
cluster_demo.py
:Logs from
cluster_gym_mujoco_demo.py
: