Open NasimShafiee opened 5 years ago
Hmm, did you use the instructions in the README for creating the conda env for this repository? The requirements.txt
should have installed the correct version of multiworld for you: https://github.com/avisingh599/reward-learning-rl/blob/93bb52f75bea850bd01f3c3342539f0231a561f3/requirements.txt#L51
I installed it again and I get the following error when I run the example which I believe it came from multiworld module:
Using FIFO scheduling algorithm. Resources requested: 0/8 CPUs, 0/0 GPUs Memory usage on this node: 4.0/8.3 GB Result logdir: /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25 Number of trials: 1 ({'ERROR': 1}) ERROR trials:
== Status == Using FIFO scheduling algorithm. Resources requested: 0/8 CPUs, 0/0 GPUs Memory usage on this node: 4.0/8.3 GB Result logdir: /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25 Number of trials: 1 ({'ERROR': 1}) ERROR trials:
Can you try cat /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25/51b91bef-algorithm=SAC-seed=2619_2019-05-14_19-29-26zic1act4/error_2019-05-14_19-29-32.txt
and post the output here?
(softlearning) nasim@nasim-PC:~/reward-learning-rl$ cat /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25/51b91bef-algorithm=SAC-seed=2619_2019-05-14_19-29-26zic1act4/error_2019-05-14_19-29-32.txt
Traceback (most recent call last): File "/home/nasim/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 443, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/nasim/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 315, in fetch_result result = ray.get(trial_future[0]) File "/home/nasim/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 2193, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
Interesting, I've never seen this error before.
@hartikainen Any idea what might be going on here?
@NasimShafiee In the meanwhile, can you try running softlearning run_example_debug examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --n_epochs 300 --active_query_frequency 10
?
It's hard to say from these logs. @NasimShafiee were there any other logs before/after the ones you already posted here? If so, could you copy-paste the full log here?
Thanks for your help! I believe the problem was my PC. I switched to another PC now and here is what I've done: 1) unset LD_PRELOAD 2) softlearning run_example_debug examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --n_epochs 300 --active_query_frequency 10
Full Log:
/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning)
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:
WARNING: Logging before flag parsing goes to stderr.
I0515 14:59:12.378803 140717083629312 init.py:34] MuJoCo library version is: 200
Warning: robosuite package not found. Run pip install robosuite
to use robosuite environments.
I0515 14:59:12.413537 140717083629312 init.py:333] Registering multiworld mujoco gym environments
I0515 14:59:13.682559 140717083629312 init.py:14] Registering goal example multiworld mujoco gym environments
2019-05-15 14:59:13,752 INFO tune.py:64 -- Did not find checkpoint file in /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13.
2019-05-15 14:59:13,752 INFO tune.py:211 -- Starting a new experiment.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs
Memory usage on this node: 6.3/33.6 GB
Using seed 9941
2019-05-15 14:59:13.764332: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-15 14:59:13.806908: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2019-05-15 14:59:13.808591: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b74b9c01e0 executing computations on platform Host. Devices:
2019-05-15 14:59:13.808624: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0):
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 215, in start_trial
self._start_trial(trial)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 143, in _start_trial
self._train(trial)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 111, in _train
remote = trial.runner.train.remote()
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 124, in remote
return self._remote(args, kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 138, in _remote
num_return_vals=num_return_vals)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 479, in _actor_method_call
method_name)(*copy.deepcopy(args))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/nasimshafiee/reward-learning-rl/examples/development/main.py", line 77, in _train
self._build()
File "/home/nasimshafiee/reward-learning-rl/examples/classifier_rl/main.py", line 30, in _build
get_goal_example_environment_from_variant(variant))
File "/home/nasimshafiee/reward-learning-rl/softlearning/environments/utils.py", line 48, in get_goal_example_environment_from_variant
return GymAdapter(env=gym.make(variant['task']))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 183, in make
return registry.make(id, kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 125, in make
env = spec.make(kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 86, in make
env = self._entry_point(**_kwargs)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/init.py", line 324, in create_image_48_sawyer_door_pull_hook_v0
non_presampled_goal_img_is_garbage=True,
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/core/image_env.py", line 75, in init
sim = self._wrapped_env.initialize_camera(init_camera)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/mujoco_env.py", line 152, in initialize_camera
viewer = mujoco_py.MjRenderContextOffscreen(sim, device_id=self.device_id)
File "mujoco_py/mjrendercontext.pyx", line 43, in mujoco_py.cymj.MjRenderContext.init
File "mujoco_py/mjrendercontext.pyx", line 108, in mujoco_py.cymj.MjRenderContext._setup_opengl_context
File "mujoco_py/opengl_context.pyx", line 128, in mujoco_py.cymj.OffscreenOpenGLContext.init
RuntimeError: Failed to initialize OpenGL
2019-05-15 14:59:17,874 INFO ray_trial_executor.py:179 -- Destroying actor for trial ec70dadd-algorithm=VICERAQ-seed=9941. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2019-05-15 14:59:17,921 WARNING util.py:62 -- The start_trial
operation took 4.16402268409729 seconds to complete, which may be a performance bottleneck.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 8/8 CPUs, 0/1 GPUs
Memory usage on this node: 6.4/33.6 GB
Result logdir: /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13
Number of trials: 1 ({'ERROR': 1})
ERROR trials:
== Status == Using FIFO scheduling algorithm. Resources requested: 8/8 CPUs, 0/1 GPUs Memory usage on this node: 6.4/33.6 GB Result logdir: /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13 Number of trials: 1 ({'ERROR': 1}) ERROR trials:
Traceback (most recent call last):
File "/home/nasimshafiee/anaconda3/envs/softlearning/bin/softlearning", line 11, in
Looks like this is the OpenGL issue that a lot of people face with mujoco-py. I would suggest trying out some more of the things outlined here: https://github.com/openai/mujoco-py/issues/187
As an alternative, you can use our docker image instead of setting up things locally. The docker image has all the configurations setup correctly, so as long as you have nvidia-docker and docker-compose installed, it will work out of the box.
I unset LD_PRELOAD but the error is not solved!
I would suggest using the docker image supplied with the repository.
Thanks!
Hi, I am using Anaconda to install your work but when I run "softlearning run_example_local examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --num-samples 5 --n_epochs 300 --active_query_frequency 10 ", I get error: File "/home/nasim/reward-learning-rl/examples/classifier_rl/utils.py", line 16, in
from multiworld.envs.mujoco import register_goal_example_envs
ModuleNotFoundError: No module named 'multiworld'
I git clone https://github.com/vitchyr/multiworld.git but it does not work. Which Multiworld module have you used?