Closed DanielTakeshi closed 3 years ago
I tried to reproduce all the instructions above, on an Ubuntu 16.04 machine instead of an Ubuntu 18.04 machine. I followed all the steps the same way as above, however this time there is a slightly different error. After compiling PyFlex and exiting Docker, I activate the softgym env and need to again refresh the environment variables (please let me know if this is not what you do):
seita@triton1:~/softgym (master) $ conda activate softgym
(softgym) seita@triton1:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten
Traceback (most recent call last):
File "examples/random_env.py", line 5, in <module>
from softgym.registered_env import env_arg_dict, SOFTGYM_ENVS
ModuleNotFoundError: No module named 'softgym'
(softgym) seita@triton1:~/softgym (master) $ export PYFLEXROOT=${PWD}/PyFlex
(softgym) seita@triton1:~/softgym (master) $ export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
(softgym) seita@triton1:~/softgym (master) $ export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH
Then I run but now there is a GLIBC error:
(softgym) seita@triton1:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten
Traceback (most recent call last):
File "examples/random_env.py", line 5, in <module>
from softgym.registered_env import env_arg_dict, SOFTGYM_ENVS
File "/home/seita/softgym/softgym/registered_env.py", line 1, in <module>
from softgym.envs.pour_water import PourWaterPosControlEnv
File "/home/seita/softgym/softgym/envs/pour_water.py", line 4, in <module>
import pyflex
ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /home/seita/softgym/PyFlex/bindings/build/pyflex.cpython-36m-x86_64-linux-gnu.so)
(softgym) seita@triton1:~/softgym (master) $ ldd --version
ldd (Ubuntu GLIBC 2.23-0ubuntu11.2) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
(softgym) seita@triton1:~/softgym (master) $
Other stuff from that machine, with paths, CUDA, etc.
(softgym) seita@triton1:~/softgym (master) $ echo $LD_LIBRARY_PATH
/home/seita/softgym/PyFlex/external/SDL2-2.0.4/lib/x64:/usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64:/home/seita/.mujoco/mjpro150/bin:/home/seita/.mujoco/mujoco200/bin:/usr/lib/nvidia-410
(softgym) seita@triton1:~/softgym (master) $
(softgym) seita@triton1:~/softgym (master) $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
(softgym) seita@triton1:~/softgym (master) $ nvidia-smi
Fri Feb 12 14:38:41 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:02:00.0 Off | N/A |
| 23% 27C P8 8W / 250W | 2MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:03:00.0 On | N/A |
| 23% 31C P8 10W / 250W | 39MiB / 12192MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 31669 G /usr/lib/xorg/Xorg 36MiB |
+-----------------------------------------------------------------------------+
(softgym) seita@triton1:~/softgym (master) $
It's version 2.23 on this machine, whereas the Ubuntu 18 version in my post above has the correct 2.27 version. However, according to this:
https://stackoverflow.com/questions/59145051/glibc-2-27-not-found-ubuntu-16-04
in Ubuntu 16.04, this is the correct version, and upgrading it can be very tricky and not straightforward. In fact, the answer there even seems to suggest to switch to Ubuntu 18.04, which confuses me even more since I believe Ubuntu 16.04 is what the SoftGym paper is normally using.
Does this issue make sense?
For both methods above, I attempted to see what happens when running inside docker, rather than outside. It seems like it fails on both Ubuntu 18.04:
(softgym) root@a230c50ebe44:/workspace/softgym# python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
(softgym) root@a230c50ebe44:/workspace/softgym#
and Ubuntu 16.04 (for clarity here I'm showing the pyflex folder first, then changing directory a bit, then running an example:
(softgym) root@c7225cc8bb31:/workspace/softgym/PyFlex/bindings/build# ls -lh
total 13M
-rw-r--r-- 1 root root 24K Feb 12 22:55 CMakeCache.txt
drwxr-xr-x 5 root root 4.0K Feb 12 22:55 CMakeFiles
-rw-r--r-- 1 root root 24K Feb 12 22:55 Makefile
-rw-r--r-- 1 root root 1.5K Feb 12 22:55 cmake_install.cmake
-rwxr-xr-x 1 root root 13M Feb 12 22:55 pyflex.cpython-36m-x86_64-linux-gnu.so
(softgym) root@c7225cc8bb31:/workspace/softgym/PyFlex/bindings/build# cd ../../..
(softgym) root@c7225cc8bb31:/workspace/softgym#
(softgym) root@c7225cc8bb31:/workspace/softgym#
(softgym) root@c7225cc8bb31:/workspace/softgym# python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
Interestingly, for the 16.04 machine, running inside Docker will not produce the "GLIBC" error that appears when running outside docker.
The docker containers already seem to have the updated versions of packages, e.g.:
(softgym) root@a230c50ebe44:/workspace/softgym# apt-get install build-essential libgl1-mesa-dev freeglut3-dev libglfw3 libgles2-mesa-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
build-essential is already the newest version (12.4ubuntu1).
freeglut3-dev is already the newest version (2.8.1-3).
libglfw3 is already the newest version (3.2.1-1).
libglfw3 set to manually installed.
libgl1-mesa-dev is already the newest version (20.0.8-0ubuntu1~18.04.1).
libgles2-mesa-dev is already the newest version (20.0.8-0ubuntu1~18.04.1).
0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
@DanielTakeshi, For approach 1, I have seen this segmentation fault a long time ago. One solution that worked for me was to re-install Nvidia driver version 440.64 (The sub-version also matters). Can you try that? Also, I am currently using cuda 10.2, although I don't think this matters.
The other issue with GLIBC makes sense to me: The docker image is built under ubuntu 18. If you compile inside a ubuntu 18 docker and try to use the .so object out side in a ubuntu 16 system, it will give an error. You can either try ubuntu 16 without the docker, or rebuild the docker image under ubuntu 16. For rebuilding the docker, you can find the docker recipe here and just change the first line to be ubuntu 16.
@Xingyu-Lin Thanks for the suggestions. Here I am trying to install without using Docker at all. This machine has a close driver version, version 440 (but a different subversion). It does not have the same exact sub-version that you suggest but unfortunately I need to first check in with other users first since it's a shared lab machine and I can't just unilaterally change the driver version (and was hoping not to since this introduces a very rigid requirement). Here are details of NVIDIA:
seita@triton3:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
seita@triton3:~$ nvidia-smi
Sun Feb 14 07:59:41 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:01:00.0 Off | N/A |
| 62% 86C P2 103W / 250W | 9356MiB / 12188MiB | 87% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:02:00.0 Off | N/A |
| 51% 82C P2 150W / 250W | 2238MiB / 12196MiB | 49% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1572 G /usr/lib/xorg/Xorg 49MiB |
| 0 4625 C python 4501MiB |
| 0 13683 C ...sup_rbt_train_quat.py -dataset 872objv2 4793MiB |
| 1 16940 C ...sup_rbt_train_quat.py -dataset 872objv2 2225MiB |
+-----------------------------------------------------------------------------+
I follow the steps:
git clone https://github.com/Xingyu-Lin/softgym.git
cd softgym/
conda env create -f environment.yml
conda activate softgym
. ./prepare_1.0.sh
. ./compile_1.0.sh
cd ../../..
All the above seems to work (though prepare_1.0.sh
script has un-necessary . activate softgym
, since I activate beforehand, since it throws an error if I don't activate the env first). Then I get:
(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
Same error. I will try again if I can get the exact NVIDIA driver version ready, so 440.33 should turn into 440.64? Also does it matter that this is done remotely through ssh connections?
hmmm acutally I am using the driver of the exact version 440.33.01
(base) yufei@yufei-OMEN-by-HP-Laptop:~$ nvidia-smi
Mon Feb 15 00:20:22 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960M Off | 00000000:01:00.0 Off | N/A |
| N/A 48C P8 N/A / N/A | 925MiB / 2002MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1140 G /usr/lib/xorg/Xorg 341MiB |
| 0 2213 G compiz 37MiB |
| 0 2285 G fcitx-qimpanel 4MiB |
| 0 2883 G ...quest-channel-token=2700618523479150890 222MiB |
| 0 3953 G ...quest-channel-token=4260863179945314425 44MiB |
| 0 4805 G ...quest-channel-token=1622710122323873953 264MiB |
+-----------------------------------------------------------------------------+
but I am using a different nvcc (version 9.2):
(base) yufei@yufei-OMEN-by-HP-Laptop:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
We ran into this SDL initialization problem a long time ago. I will try to remember more details on how we solved it. But perhaps you could give it a try to change the nvcc version?
I see, thanks for checking. The machine in my last past does not have CUDA 9.2 and I will need to check in with other users to make sure I can install it. I did try downgrading to 9.0 (the other version available) and get the same error after the compilation steps:
(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
(softgym) seita@triton3:~/softgym$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
(softgym) seita@triton3:~/softgym$
So we know that for Ubuntu 16.04 with this nvidia driver, CUDA 9.0 and 10.0 are not working and maybe 9.2 will be once I can test it. :) This is without any docker compilation at all (since it's Ubuntu 16.04).
A few other suggestions based on how we debugged this issue:
random_env.py
example?export LIBGL_DEBUG=verbose
in the shell which can give you more information about GL during linking.nm -D pyflex.cpython-36m-x86_64-linux-gnu.so | grep SDL_GL
to see if all symbols are normal.Unable to initialize SDL
and see what went wrong thereAlso, you can install multiple CUDA version on ubuntu and set the environment variables to properly link to the CUDA version that you need. Hope that will help!
Earlier today installed SoftGym on a machine with Ubuntu 16.04 and CUDA 9.2 and NO Docker - It had CUDA 9.0 installed but got permission to purge it. Docker install with CUDA 9.0 apparently is not possible so had to upgrade. I guess the only contribution of this post is knowing the NO docker install also works with Driver Version: 396.37
.
This is the output of nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:01:00.0 On | N/A |
| 22% 34C P8 15W / 250W | 205MiB / 12207MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1021 G /usr/lib/xorg/Xorg 145MiB |
| 0 1695 G compiz 56MiB |
+-----------------------------------------------------------------------------+
I upgraded manually to Driver Version: 430
but after installing CUDA 9.2 using .deb
package I got downgraded to Driver Version: 396.37
.
nvcc --version | grep "release" | awk '{print $6}' | cut -c2-
outputs: 9.2.148
To add on some of your questions from your last post:
Try to install 9.2 (alone or alongside other versions as mentioned before) and leave your driver version to be managed by .deb
install package if possible.
If you can only do it remotely I would also worry about sudo reboot
as it is needed after CUDA install. I am not sure how it works when done remotely.
You metion:
I follow the steps:
git clone https://github.com/Xingyu-Lin/softgym.git cd softgym/ conda env create -f environment.yml conda activate softgym . ./prepare_1.0.sh . ./compile_1.0.sh cd ../../..
This were my steps from history
:
conda env create -f environment.yml
conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH
. ./compile_1.0.sh
cd ~/repos/softgym
python examples/random_env.py --env_name PassWater
The only problem I had was with environment.yml
not finding some packages, so I had to install them using pip
.
Hope this helps.
So, apparently the headless option was all I needed on my Ubuntu 18.04 machine. I should have read through the random_env.py
code carefully and tried that. To be clear, I followed the steps in my first post here which compiles on docker. Here are the machine specs:
(softgym) seita@mason:~/softgym (master) $ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
(softgym) seita@mason:~/softgym (master) $ nvidia-smi
Wed Feb 17 09:28:29 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:04:00.0 Off | 0 |
| N/A 32C P0 27W / 250W | 33MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:08:00.0 Off | 0 |
| N/A 34C P0 25W / 250W | 28MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-PCIE... Off | 00000000:09:00.0 Off | 0 |
| N/A 37C P0 28W / 250W | 28MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-PCIE... Off | 00000000:85:00.0 Off | 0 |
| N/A 36C P0 29W / 250W | 28MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-PCIE... Off | 00000000:89:00.0 Off | 0 |
| N/A 33C P0 25W / 250W | 28MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 17055 G /usr/lib/xorg/Xorg 28MiB |
| 1 N/A N/A 17055 G /usr/lib/xorg/Xorg 23MiB |
| 2 N/A N/A 17055 G /usr/lib/xorg/Xorg 23MiB |
| 3 N/A N/A 17055 G /usr/lib/xorg/Xorg 23MiB |
| 4 N/A N/A 17055 G /usr/lib/xorg/Xorg 23MiB |
+-----------------------------------------------------------------------------+
I did the compilation in docker, and the symbols look good when I check outside of docker:
(softgym) seita@mason:~/softgym/PyFlex/bindings/build (master) $ ls -lh
total 13M
-rw-r--r-- 1 root root 24K Feb 17 09:21 CMakeCache.txt
drwxr-xr-x 5 root root 4.0K Feb 17 09:21 CMakeFiles
-rw-r--r-- 1 root root 1.5K Feb 17 09:21 cmake_install.cmake
-rw-r--r-- 1 root root 24K Feb 17 09:21 Makefile
-rwxr-xr-x 1 root root 13M Feb 17 09:21 pyflex.cpython-36m-x86_64-linux-gnu.so
(softgym) seita@mason:~/softgym/PyFlex/bindings/build (master) $ nm -D pyflex.cpython-36m-x86_64-linux-gnu.so | grep SDL_GL
00000000000c1be0 T SDL_GL_BindTexture
00000000000c28d0 T SDL_GL_CreateContext
00000000000c2950 T SDL_GL_DeleteContext
00000000000c28a0 T SDL_GL_ExtensionSupported
00000000000c28c0 T SDL_GL_GetAttribute
00000000000c2900 T SDL_GL_GetCurrentContext
00000000000c28f0 T SDL_GL_GetCurrentWindow
00000000000c2910 T SDL_GL_GetDrawableSize
00000000000c2880 T SDL_GL_GetProcAddress
00000000000c2930 T SDL_GL_GetSwapInterval
00000000000c2870 T SDL_GL_LoadLibrary
00000000000c28e0 T SDL_GL_MakeCurrent
00000000000c2980 T SDL_GL_ResetAttributes
00000000000c28b0 T SDL_GL_SetAttribute
00000000000c2920 T SDL_GL_SetSwapInterval
00000000000c2940 T SDL_GL_SwapWindow
00000000000c1bf0 T SDL_GL_UnbindTexture
00000000000c2890 T SDL_GL_UnloadLibrary
Then I activate the conda env, get the paths set up:
seita@mason:~/softgym/PyFlex/bindings/build (master) $ conda activate softgym
(softgym) seita@mason:~/softgym/PyFlex/bindings/build (master) $ cd ../../..
(softgym) seita@mason:~/softgym (master) $ export PYFLEXROOT=${PWD}/PyFlex
(softgym) seita@mason:~/softgym (master) $ export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
(softgym) seita@mason:~/softgym (master) $ export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH
and run this (I have to make the data/
directory):
(softgym) seita@mason:~/softgym (master) $ mkdir data
(softgym) seita@mason:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten --headless 1
Waiting to generate environment variations. May take 1 minute for each variation...
Compute Device: Tesla V100-PCIE-32GB
Pyflex init done!
config 0: camera params {'default_camera': {'pos': array([-0. , 0.82, 0.82]), 'angle': array([ 0. , -0.78539816, 0. ]), 'width': 720, 'height': 720}}, flatten area: 0.27312500000000006
MoviePy - Building file ./data/ClothFlatten.gif with imageio.
Video generated and save to ./data/ClothFlatten.gif
Here is the GIF:
@Xingyu-Lin I think it may be worthy to mention the headless option in more detail in the README. Or maybe a link to this issue report...
Now we don't use docker. Here is the machine info:
(softgym) seita@triton3:~/softgym$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
seita@triton3:~$ nvidia-smi
Sun Feb 14 07:59:41 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:01:00.0 Off | N/A |
| 62% 86C P2 103W / 250W | 9356MiB / 12188MiB | 87% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:02:00.0 Off | N/A |
| 51% 82C P2 150W / 250W | 2238MiB / 12196MiB | 49% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1572 G /usr/lib/xorg/Xorg 49MiB |
| 0 4625 C python 4501MiB |
| 0 13683 C ...sup_rbt_train_quat.py -dataset 872objv2 4793MiB |
| 1 16940 C ...sup_rbt_train_quat.py -dataset 872objv2 2225MiB |
+-----------------------------------------------------------------------------+
All I had to do was literally this (steps from earlier):
git clone https://github.com/Xingyu-Lin/softgym.git
cd softgym/
conda env create -f environment.yml
conda activate softgym
. ./prepare_1.0.sh
. ./compile_1.0.sh
cd ../../..
Then make the data/
directory and run:
(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name ClothFold --headless 1
Waiting to generate environment variations. May take 1 minute for each variation...
Compute Device: TITAN Xp
Pyflex init done!
config 0: {'default_camera': {'pos': array([-0. , 0.82, 0.82]), 'angle': array([ 0. , -0.78539816, 0. ]), 'width': 720, 'height': 720}}
MoviePy - Building file ./data/ClothFold.gif with imageio.
Video generated and save to ./data/ClothFold.gif
And here is the video:
@Xingyu-Lin I'll update this post (to avoid cluttering the issue with more posts) with working vs non-working settings.
By "working" I mean it can run the example commands in headless mode.
By the CUDA version, I mean the version from nvcc -V
, NOT the nvidia-smi
command. That latter command is used for the NVIDIA Driver version.
(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name ClothFold --headless 1
Waiting to generate environment variations. May take 1 minute for each variation...
*** stack smashing detected ***: python terminated
Aborted (core dumped)
I then quickly changed my paths to point to CUDA 9.0 on that machine (instead of 10.0) and the code runs as expected. Interesting...
@Xingyu-Lin I think we can probably close this ;) I am going to update https://danieltakeshi.github.io/2021/02/20/softgym/ with additional working setups.
Hello! Here are my attempts at installing SoftGym.
Contents:
Background
I have an Ubuntu 18.04 machine and I am attempting to get softgym to work. The machine has CUDA 10.0:
My directory structure is like this: conda environments are installed in
/home/seita/miniconda3/
(I use miniconda) and softgym is cloned to/home/seita/softgym/
. I am following these instructions in parallel:To start, after cloning a fresh copy of the repository, let's pull the docker image and make sure I have things updated. Here's what the output looks like:
Remark: I have a
.bashrc
setting that tells me the branch of a repo in parentheses, so it says(master)
right after the "softgym" text in the command line.Creating the Conda Environment
First before entering docker, I create the conda environment. The instructions in this fork suggest making the conda environment inside docker, however what that does is it will make the conda environment created by the
root
user and thus I won't be able to runconda install <xyz>
commands outside of docker. In addition, installing it the way the README suggests will get the package versions set up correctly:Now that environment is saved in
/home/seita/miniconda3/envs/softgym/
Entering Docker and Adjusting Paths
Next, let's go into docker to compile PyFlex. That is the only reason to use Docker. I run this command:
Explanation: the first
-v
will mount/home/seita/softgym
(i.e., where I cloned softgym) to/workspace/softgym
inside the docker. So inside docker, I can change directory to/workspace/softgym
and it will look like as if I am inside/home/seita/softgym
. A similar thing happens with the second mounting command for miniconda. In fact I'm using the same exact directory before and after the colon, which means the directory structure is the same inside docker. The other commands are just copied from what you have.Running the command means I am in a docker container as a "root" user. I go to the softgym directory. Next the current README here just says I need to adjust the PATH variable, and then run a
prepare_1.0.sh
script. This other reference omits theprepare_1.0.sh
script by just manually performing the same commands by adjustingPYFLEXROOT
,PYTHONPATH
, andLD_LIBRARY_PATH
. Here are the environment variables at the beginning when I enter docker:Now let's follow the current README and adjust the path, then run the prepare script which will assign to three environment variables. The prepare script also activates the softgym conda environment that we created earlier outside of docker. We need to adjust the path so that the
. activate softgym
command will work:Now we see that we are in the softgym conda environment, and furthermore, that the environment variables are updated:
Now we have to install the
pybind11
package as stated in the README [output omitted as it's pretty standard]:Compiling PyFlex in Docker
Now let's try to compile PyFlex. The current instructions in this repo say to do this:
. ./prepare_1.0.sh && ./compile_1.0.sh
. We've already done the first, let's do the second part to compile, and I could have merged the commands together by doingconda install pybind11
outside of docker.This looks like it proceeded well, and outside docker I can see the Pyflex compiled module in
softgym/PyFlex/bindings/build/
.Note: upon re-reading this post, I actually realize there is a slight typo, I did
./compile_1.0.sh
instead of. ./compile_1.0.sh
with the leading period. However I don't think this affects things and I re-did it with the period there and it gave the same output.Using the Compiled Code
Now we are outside of docker. I refresh via the
. ~/.bashrc
, activate the same conda environment and attempt to run an example:Normally since this is the name of the package, I would expect to be able to do
pip install -e .
or somewthere, but there is nosetup.py
file so that must not be the issue. Following the other linked references at the top, perhaps we have to refresh the environment variables? Here's what I had before:Then I update them, but then that leads to a segmentation fault.
So now I am not sure how to proceed. The libraries seem to be installed:
(using the one-liner solution to check if a package is installed or not)
@Xingyu-Lin @yufeiwang63 If any of you two have time to reproduce this setup I am wondering how you managed to get around the segmentation fault? I know it might take a while but I wonder if any of you are able to start with a "clean" machine and then go through the full installation steps to get it working?