Xingyu-Lin / softgym

SoftGym is a set of benchmark environments for deformable object manipulation.
BSD 3-Clause "New" or "Revised" License
274 stars 62 forks source link

Segmentation Fault, Core Dumped, Using Ubuntu 16.04 or 18.04, inside and outside Docker [steps to reproduce] #9

Closed DanielTakeshi closed 3 years ago

DanielTakeshi commented 3 years ago

Hello! Here are my attempts at installing SoftGym.

Contents:

Background

I have an Ubuntu 18.04 machine and I am attempting to get softgym to work. The machine has CUDA 10.0:

seita@mason:~ $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

My directory structure is like this: conda environments are installed in /home/seita/miniconda3/ (I use miniconda) and softgym is cloned to /home/seita/softgym/. I am following these instructions in parallel:

To start, after cloning a fresh copy of the repository, let's pull the docker image and make sure I have things updated. Here's what the output looks like:

seita@mason:~/softgym (master) $ docker pull xingyu/softgym
Using default tag: latest
latest: Pulling from xingyu/softgym
Digest: sha256:29a9f674cf3527e645a237facdfe4b5634c23cd0f1522290e0a523308435ccaa
Status: Image is up to date for xingyu/softgym:latest
docker.io/xingyu/softgym:latest
seita@mason:~/softgym (master) $ docker -v
Docker version 19.03.6, build 369ce74a3c
seita@mason:~/softgym (master) $ nvidia-docker -v
Docker version 19.03.6, build 369ce74a3c
seita@mason:~/softgym (master) $ docker images
REPOSITORY                TAG                             IMAGE ID            CREATED             SIZE
xingyu/softgym            latest                          2cbcd6a50965        3 months ago        2.44GB

Remark: I have a .bashrc setting that tells me the branch of a repo in parentheses, so it says (master) right after the "softgym" text in the command line.

Creating the Conda Environment

First before entering docker, I create the conda environment. The instructions in this fork suggest making the conda environment inside docker, however what that does is it will make the conda environment created by the root user and thus I won't be able to run conda install <xyz> commands outside of docker. In addition, installing it the way the README suggests will get the package versions set up correctly:

seita@mason:~/softgym (master) $ conda env create -f environment.yml
Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies.  Conda may not use the correct pip to install your packages, and they may end up in the wrong place.  Please add an explicit pip dependency.  I'm adding one for you, but still nagging you.
Collecting package metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 4.7.12
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Ran pip subprocess with arguments:
['/home/seita/miniconda3/envs/softgym/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/seita/softgym/condaenv._19qblqw.requirements.txt']
Pip subprocess output:
Collecting gym==0.14.0
  Using cached gym-0.14.0-py3-none-any.whl
Requirement already satisfied: six in /home/seita/.local/lib/python3.6/site-packages (from gym==0.14.0->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 2)) (1.12.0)
Requirement already satisfied: numpy>=1.10.4 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from gym==0.14.0->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 2)) (1.17.2)
Collecting opencv-python==4.1.1.26
  Using cached opencv_python-4.1.1.26-cp36-cp36m-manylinux1_x86_64.whl (28.7 MB)
Collecting pyquaternion==0.9.5
  Using cached pyquaternion-0.9.5-py3-none-any.whl (14 kB)
Collecting Shapely==1.6.4.post2
  Using cached Shapely-1.6.4.post2-cp36-cp36m-manylinux1_x86_64.whl (1.5 MB)
Collecting sk-video==1.1.10
  Using cached sk_video-1.1.10-py2.py3-none-any.whl (2.3 MB)
Collecting cloudpickle~=1.2.0
  Using cached cloudpickle-1.2.2-py2.py3-none-any.whl (25 kB)
Collecting pyglet<=1.3.2,>=1.2.0
  Using cached pyglet-1.3.2-py2.py3-none-any.whl (1.0 MB)
Collecting gtimer
  Using cached gtimer-1.0.0b5-py3-none-any.whl
Collecting moviepy
  Using cached moviepy-1.0.3-py3-none-any.whl
Requirement already satisfied: imageio<3.0,>=2.5 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (2.6.1)
Requirement already satisfied: decorator<5.0,>=4.0.2 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (4.4.2)
Requirement already satisfied: requests<3.0,>=2.8.1 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (2.25.1)
Requirement already satisfied: pillow in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from imageio<3.0,>=2.5->moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (6.1.0)
Collecting imageio-ffmpeg>=0.2.0
  Using cached imageio_ffmpeg-0.4.3-py3-none-manylinux2010_x86_64.whl (26.9 MB)
Collecting proglog<=1.0.0
  Using cached proglog-0.1.9-py3-none-any.whl
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (1.26.3)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /home/seita/miniconda3/envs/softgym/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy->-r /home/seita/softgym/condaenv._19qblqw.requirements.txt (line 3)) (2.10)
Collecting tqdm<5.0,>=4.11.2
  Using cached tqdm-4.56.2-py2.py3-none-any.whl (72 kB)
Collecting future
  Using cached future-0.18.2-cp36-none-any.whl
Collecting scipy
  Using cached scipy-1.5.4-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)
Installing collected packages: tqdm, future, scipy, pyglet, proglog, imageio-ffmpeg, cloudpickle, sk-video, Shapely, pyquaternion, opencv-python, moviepy, gym, gtimer
Successfully installed Shapely-1.6.4.post2 cloudpickle-1.2.2 future-0.18.2 gtimer-1.0.0b5 gym-0.14.0 imageio-ffmpeg-0.4.3 moviepy-1.0.3 opencv-python-4.1.1.26 proglog-0.1.9 pyglet-1.3.2 pyquaternion-0.9.5 scipy-1.5.4 sk-video-1.1.10 tqdm-4.56.2

#
# To activate this environment, use
#
#     $ conda activate softgym
#
# To deactivate an active environment, use
#
#     $ conda deactivate

seita@mason:~/softgym (master) $ 

Now that environment is saved in /home/seita/miniconda3/envs/softgym/

Entering Docker and Adjusting Paths

Next, let's go into docker to compile PyFlex. That is the only reason to use Docker. I run this command:

nvidia-docker run \
    -v /home/seita/softgym:/workspace/softgym \
    -v /home/seita/miniconda3:/home/seita/miniconda3 \
    -v /tmp/.X11-unix:/tmp/.X11-unix \
    --gpus all \
    -e DISPLAY=$DISPLAY \
    -e QT_X11_NO_MITSHM=1 \
    -it xingyu/softgym:latest bash

Explanation: the first -v will mount /home/seita/softgym (i.e., where I cloned softgym) to /workspace/softgym inside the docker. So inside docker, I can change directory to /workspace/softgym and it will look like as if I am inside /home/seita/softgym. A similar thing happens with the second mounting command for miniconda. In fact I'm using the same exact directory before and after the colon, which means the directory structure is the same inside docker. The other commands are just copied from what you have.

Running the command means I am in a docker container as a "root" user. I go to the softgym directory. Next the current README here just says I need to adjust the PATH variable, and then run a prepare_1.0.sh script. This other reference omits the prepare_1.0.sh script by just manually performing the same commands by adjusting PYFLEXROOT, PYTHONPATH, and LD_LIBRARY_PATH. Here are the environment variables at the beginning when I enter docker:

root@a230c50ebe44:/workspace# cd softgym/
root@a230c50ebe44:/workspace/softgym# echo $PATH
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root@a230c50ebe44:/workspace/softgym# echo $PYFLEXROOT

root@a230c50ebe44:/workspace/softgym# echo $PYTHONPATH

root@a230c50ebe44:/workspace/softgym# echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64

Now let's follow the current README and adjust the path, then run the prepare script which will assign to three environment variables. The prepare script also activates the softgym conda environment that we created earlier outside of docker. We need to adjust the path so that the . activate softgym command will work:

root@a230c50ebe44:/workspace/softgym# export PATH="/home/seita/miniconda3/bin:$PATH"
root@a230c50ebe44:/workspace/softgym# . ./prepare_1.0.sh 
(softgym) root@a230c50ebe44:/workspace/softgym# 

Now we see that we are in the softgym conda environment, and furthermore, that the environment variables are updated:

(softgym) root@a230c50ebe44:/workspace/softgym# echo $PATH
/home/seita/miniconda3/envs/softgym/bin:/home/seita/miniconda3/condabin:/home/seita/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
(softgym) root@a230c50ebe44:/workspace/softgym# echo $PYFLEXROOT
/workspace/softgym/PyFlex
(softgym) root@a230c50ebe44:/workspace/softgym# echo $PYTHONPATH
/workspace/softgym/PyFlex/bindings/build:
(softgym) root@a230c50ebe44:/workspace/softgym#  echo $LD_LIBRARY_PATH
/workspace/softgym/PyFlex/external/SDL2-2.0.4/lib/x64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
(softgym) root@a230c50ebe44:/workspace/softgym# 

Now we have to install the pybind11 package as stated in the README [output omitted as it's pretty standard]:

(softgym) root@a230c50ebe44:/workspace/softgym# conda install pybind11

Compiling PyFlex in Docker

Now let's try to compile PyFlex. The current instructions in this repo say to do this: . ./prepare_1.0.sh && ./compile_1.0.sh. We've already done the first, let's do the second part to compile, and I could have merged the commands together by doing conda install pybind11 outside of docker.

(softgym) root@a230c50ebe44:/workspace/softgym# ./compile_1.0.sh 
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found suitable version "9.2", minimum required is "9.0") 
-- Found PythonInterp: /home/seita/miniconda3/envs/softgym/bin/python3.6 (found suitable version "3.6.12", minimum required is "3.6") 
-- Found PythonLibs: /home/seita/miniconda3/envs/softgym/lib/libpython3.6m.so
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/softgym/PyFlex/bindings/build
Scanning dependencies of target pyflex
[  5%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/pfm.cpp.o
[ 10%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/maths.cpp.o
[ 15%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/core.cpp.o
[ 21%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/voxelize.cpp.o
[ 26%] Building CXX object CMakeFiles/pyflex.dir/pyflex.cpp.o
[ 36%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/sdf.cpp.o
[ 36%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/mesh.cpp.o
[ 47%] Building CXX object CMakeFiles/pyflex.dir/imgui.cpp.o
[ 47%] Building CXX object CMakeFiles/pyflex.dir/opengl/imguiRenderGL.cpp.o
[ 57%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/png.cpp.o
[ 57%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/tga.cpp.o
[ 63%] Building CXX object CMakeFiles/pyflex.dir/opengl/shadersGL.cpp.o
[ 68%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/aabbtree.cpp.o
[ 78%] Building CXX object CMakeFiles/pyflex.dir/opengl/shader.cpp.o
[ 78%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/extrude.cpp.o
[ 84%] Building CXX object CMakeFiles/pyflex.dir/shadersDemoContext.cpp.o
[ 89%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/platform.cpp.o
[ 94%] Building CXX object CMakeFiles/pyflex.dir/workspace/softgym/PyFlex/core/perlin.cpp.o
In file included from /usr/include/c++/7/cassert:44:0,
                 from /workspace/softgym/PyFlex/core/quat.h:30,
                 from /workspace/softgym/PyFlex/core/maths.h:241,
                 from /workspace/softgym/PyFlex/core/aabbtree.h:31,
                 from /workspace/softgym/PyFlex/core/aabbtree.cpp:28:
/workspace/softgym/PyFlex/core/aabbtree.cpp: In member function 'void AABBTree::Build()':
/workspace/softgym/PyFlex/core/aabbtree.cpp:136:22: warning: '*' in boolean context, suggest '&&' instead [-Wint-in-bool-context]
     assert(m_numFaces*3);
            ~~~~~~~~~~^~
/workspace/softgym/PyFlex/core/mesh.cpp: In function 'void ExportToObj(const char*, const Mesh&)':
/workspace/softgym/PyFlex/core/mesh.cpp:601:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
     if (!file)
     ^~
/workspace/softgym/PyFlex/core/mesh.cpp:604:2: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
  file << "# positions" << endl;
  ^~~~
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp: In function 'GLuint LoadTexture(const char*)':
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp:212:16: warning: converting to non-pointer type 'GLuint {aka unsigned int}' from NULL [-Wconversion-null]
         return NULL;
                ^~~~
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp: In function 'void InitRenderHeadless(const RenderInitOptions&, int, int)':
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp:3386:25: warning: invalid conversion from 'EGLConfig {aka void*}' to 'void**' [-fpermissive]
  g_eglConfig = configs[0];
                ~~~~~~~~~^
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp:3390:33: warning: invalid conversion from 'EGLContext {aka void*}' to 'void**' [-fpermissive]
  g_eglContext = eglCreateContext(
                 ~~~~~~~~~~~~~~~~^
           g_eglDisplay,
           ~~~~~~~~~~~~~          
           g_eglConfig,
           ~~~~~~~~~~~~           
           EGL_NO_CONTEXT,
           ~~~~~~~~~~~~~~~        
           NULL);
           ~~~~~                  
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp:3398:40: warning: invalid conversion from 'EGLSurface {aka void*}' to 'void**' [-fpermissive]
  g_eglSurface = eglCreatePbufferSurface(g_eglDisplay, g_eglConfig,
                 ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
            eglPBAttribs);
            ~~~~~~~~~~~~~                
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp: At global scope:
/workspace/softgym/PyFlex/bindings/opengl/shadersGL.cpp:248:13: warning: '{anonymous}::g_eglDisplay' defined but not used [-Wunused-variable]
 EGLDisplay* g_eglDisplay;
             ^~~~~~~~~~~~
In file included from /workspace/softgym/PyFlex/bindings/pyflex.cpp:1:0:
/workspace/softgym/PyFlex/bindings/main.cpp: In function 'int GetKeyFromGameControllerButton(SDL_GameControllerButton)':
/workspace/softgym/PyFlex/bindings/main.cpp:84:9: warning: case value '17' not in enumerated type 'SDL_GameControllerButton' [-Wswitch]
         case SDL_CONTROLLER_BUTTON_RIGHT_TRIGGER: {
         ^~~~
In file included from /workspace/softgym/PyFlex/bindings/scenes.h:30:0,
                 from /workspace/softgym/PyFlex/bindings/main.cpp:587,
                 from /workspace/softgym/PyFlex/bindings/pyflex.cpp:1:
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_cloth.h: In member function 'virtual void SoftgymCloth::Initialize(pybind11::array_t<float>, int)':
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_cloth.h:63:14: warning: unused variable 'size' [-Wunused-variable]
          int size = g_buffers->triangles.size();
              ^~~~
In file included from /workspace/softgym/PyFlex/bindings/scenes.h:31:0,
                 from /workspace/softgym/PyFlex/bindings/main.cpp:587,
                 from /workspace/softgym/PyFlex/bindings/pyflex.cpp:1:
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_fluid.h: In member function 'virtual void SoftgymFluid::Initialize(pybind11::array_t<float>, int)':
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_fluid.h:43:9: warning: unused variable 'surfaceTension' [-Wunused-variable]
   float surfaceTension = ptr[4];
         ^~~~~~~~~~~~~~
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_fluid.h:44:9: warning: unused variable 'adhesion' [-Wunused-variable]
   float adhesion = ptr[5];
         ^~~~~~~~
In file included from /workspace/softgym/PyFlex/bindings/scenes.h:32:0,
                 from /workspace/softgym/PyFlex/bindings/main.cpp:587,
                 from /workspace/softgym/PyFlex/bindings/pyflex.cpp:1:
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_softbody.h: In member function 'void SoftgymSoftBody::CreateSoftBody(SoftgymSoftBody::Instance, int)':
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_softbody.h:202:10: warning: unused variable 'createStart' [-Wunused-variable]
   double createStart = GetSeconds();
          ^~~~~~~~~~~
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_softbody.h:222:10: warning: unused variable 'createEnd' [-Wunused-variable]
   double createEnd = GetSeconds();
          ^~~~~~~~~
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_softbody.h:233:10: warning: unused variable 'skinStart' [-Wunused-variable]
   double skinStart = GetSeconds();
          ^~~~~~~~~
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_softbody.h:245:10: warning: unused variable 'skinEnd' [-Wunused-variable]
   double skinEnd = GetSeconds();
          ^~~~~~~
In file included from /workspace/softgym/PyFlex/bindings/scenes.h:33:0,
                 from /workspace/softgym/PyFlex/bindings/main.cpp:587,
                 from /workspace/softgym/PyFlex/bindings/pyflex.cpp:1:
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_rigid_cloth.h: In member function 'virtual void SoftgymRigidCloth::Initialize(pybind11::array_t<float>, int)':
/workspace/softgym/PyFlex/bindings/softgym_scenes/softgym_rigid_cloth.h:72:60: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             for (int i=int(g_mesh->GetNumVertices()*0.6); i<g_mesh->GetNumVertices(); ++i)
                                                           ~^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /workspace/softgym/PyFlex/bindings/scenes.h:80:0,
                 from /workspace/softgym/PyFlex/bindings/main.cpp:587,
                 from /workspace/softgym/PyFlex/bindings/pyflex.cpp:1:
/workspace/softgym/PyFlex/bindings/scenes/shapechannels.h: In member function 'void ShapeChannels::Initialize()':
/workspace/softgym/PyFlex/bindings/scenes/shapechannels.h:26:108: warning: '<<' in boolean context, did you mean '<' ? [-Wint-in-bool-context]
    AddBox(Vec3(0.5f, 0.1f, 0.5f), Vec3(0.0f, 0.5f + i*0.5f, 0.0f), Quat(), false, eNvFlexPhaseShapeChannel0<<i);
                                                                                   ~~~~~~~~~~~~~~~~~~~~~~~~~^~~
[100%] Linking CXX shared module pyflex.cpython-36m-x86_64-linux-gnu.so
[100%] Built target pyflex

This looks like it proceeded well, and outside docker I can see the Pyflex compiled module in softgym/PyFlex/bindings/build/.

Note: upon re-reading this post, I actually realize there is a slight typo, I did ./compile_1.0.sh instead of . ./compile_1.0.sh with the leading period. However I don't think this affects things and I re-did it with the period there and it gave the same output.

Using the Compiled Code

Now we are outside of docker. I refresh via the . ~/.bashrc, activate the same conda environment and attempt to run an example:

(softgym) seita@mason:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten
Traceback (most recent call last):
  File "examples/random_env.py", line 5, in <module>
    from softgym.registered_env import env_arg_dict, SOFTGYM_ENVS
ModuleNotFoundError: No module named 'softgym'

Normally since this is the name of the package, I would expect to be able to do pip install -e . or somewthere, but there is no setup.py file so that must not be the issue. Following the other linked references at the top, perhaps we have to refresh the environment variables? Here's what I had before:

(softgym) seita@mason:~/softgym (master) $ echo $PATH
/home/seita/miniconda3/envs/softgym/bin:/home/seita/.local/bin:/home/seita/miniconda3/condabin:/home/seita/blender-2.82a-linux64:/usr/local/cuda/bin:/home/seita/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
(softgym) seita@mason:~/softgym (master) $ echo $PYTHONPATH

(softgym) seita@mason:~/softgym (master) $ echo $PYFLEXROOT

(softgym) seita@mason:~/softgym (master) $ echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64::/home/seita/.mujoco/mujoco200/bin

Then I update them, but then that leads to a segmentation fault.

(softgym) seita@mason:~/softgym (master) $ export PYFLEXROOT=${PWD}/PyFlex
(softgym) seita@mason:~/softgym (master) $ export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
(softgym) seita@mason:~/softgym (master) $ export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH
(softgym) seita@mason:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
(softgym) seita@mason:~/softgym (master) $ 

So now I am not sure how to proceed. The libraries seem to be installed:

(softgym) seita@mason:~/softgym (master) $ (dpkg-query -W -f='${Status}' libglfw3 2>/dev/null | grep -c "ok installed")
1
(softgym) seita@mason:~/softgym (master) $ (dpkg-query -W -f='${Status}' libgles2-mesa-dev 2>/dev/null | grep -c "ok installed")
1
(softgym) seita@mason:~/softgym (master) $ 

(using the one-liner solution to check if a package is installed or not)

@Xingyu-Lin @yufeiwang63 If any of you two have time to reproduce this setup I am wondering how you managed to get around the segmentation fault? I know it might take a while but I wonder if any of you are able to start with a "clean" machine and then go through the full installation steps to get it working?

DanielTakeshi commented 3 years ago

Attempt 2: Follow-up on Ubuntu 16.04 Machine and Compiling Docker

I tried to reproduce all the instructions above, on an Ubuntu 16.04 machine instead of an Ubuntu 18.04 machine. I followed all the steps the same way as above, however this time there is a slightly different error. After compiling PyFlex and exiting Docker, I activate the softgym env and need to again refresh the environment variables (please let me know if this is not what you do):

seita@triton1:~/softgym (master) $ conda activate softgym
(softgym) seita@triton1:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten
Traceback (most recent call last):
  File "examples/random_env.py", line 5, in <module>
    from softgym.registered_env import env_arg_dict, SOFTGYM_ENVS
ModuleNotFoundError: No module named 'softgym'
(softgym) seita@triton1:~/softgym (master) $ export PYFLEXROOT=${PWD}/PyFlex
(softgym) seita@triton1:~/softgym (master) $ export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
(softgym) seita@triton1:~/softgym (master) $ export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

Then I run but now there is a GLIBC error:

(softgym) seita@triton1:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten
Traceback (most recent call last):
  File "examples/random_env.py", line 5, in <module>
    from softgym.registered_env import env_arg_dict, SOFTGYM_ENVS
  File "/home/seita/softgym/softgym/registered_env.py", line 1, in <module>
    from softgym.envs.pour_water import PourWaterPosControlEnv
  File "/home/seita/softgym/softgym/envs/pour_water.py", line 4, in <module>
    import pyflex
ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /home/seita/softgym/PyFlex/bindings/build/pyflex.cpython-36m-x86_64-linux-gnu.so)
(softgym) seita@triton1:~/softgym (master) $ ldd --version
ldd (Ubuntu GLIBC 2.23-0ubuntu11.2) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
(softgym) seita@triton1:~/softgym (master) $ 

Other stuff from that machine, with paths, CUDA, etc.

(softgym) seita@triton1:~/softgym (master) $ echo $LD_LIBRARY_PATH
/home/seita/softgym/PyFlex/external/SDL2-2.0.4/lib/x64:/usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64:/home/seita/.mujoco/mjpro150/bin:/home/seita/.mujoco/mujoco200/bin:/usr/lib/nvidia-410
(softgym) seita@triton1:~/softgym (master) $ 
(softgym) seita@triton1:~/softgym (master) $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
(softgym) seita@triton1:~/softgym (master) $ nvidia-smi
Fri Feb 12 14:38:41 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   27C    P8     8W / 250W |      2MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:03:00.0  On |                  N/A |
| 23%   31C    P8    10W / 250W |     39MiB / 12192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1     31669      G   /usr/lib/xorg/Xorg                            36MiB |
+-----------------------------------------------------------------------------+
(softgym) seita@triton1:~/softgym (master) $ 

It's version 2.23 on this machine, whereas the Ubuntu 18 version in my post above has the correct 2.27 version. However, according to this:

https://stackoverflow.com/questions/59145051/glibc-2-27-not-found-ubuntu-16-04

in Ubuntu 16.04, this is the correct version, and upgrading it can be very tricky and not straightforward. In fact, the answer there even seems to suggest to switch to Ubuntu 18.04, which confuses me even more since I believe Ubuntu 16.04 is what the SoftGym paper is normally using.

Does this issue make sense?

DanielTakeshi commented 3 years ago

Attempt 3: Running inside Docker (instead of Outside)

For both methods above, I attempted to see what happens when running inside docker, rather than outside. It seems like it fails on both Ubuntu 18.04:

(softgym) root@a230c50ebe44:/workspace/softgym# python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
(softgym) root@a230c50ebe44:/workspace/softgym# 

and Ubuntu 16.04 (for clarity here I'm showing the pyflex folder first, then changing directory a bit, then running an example:

(softgym) root@c7225cc8bb31:/workspace/softgym/PyFlex/bindings/build# ls -lh 
total 13M
-rw-r--r-- 1 root root  24K Feb 12 22:55 CMakeCache.txt
drwxr-xr-x 5 root root 4.0K Feb 12 22:55 CMakeFiles
-rw-r--r-- 1 root root  24K Feb 12 22:55 Makefile
-rw-r--r-- 1 root root 1.5K Feb 12 22:55 cmake_install.cmake
-rwxr-xr-x 1 root root  13M Feb 12 22:55 pyflex.cpython-36m-x86_64-linux-gnu.so
(softgym) root@c7225cc8bb31:/workspace/softgym/PyFlex/bindings/build# cd ../../..
(softgym) root@c7225cc8bb31:/workspace/softgym# 
(softgym) root@c7225cc8bb31:/workspace/softgym# 
(softgym) root@c7225cc8bb31:/workspace/softgym# python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)

Interestingly, for the 16.04 machine, running inside Docker will not produce the "GLIBC" error that appears when running outside docker.

The docker containers already seem to have the updated versions of packages, e.g.:

(softgym) root@a230c50ebe44:/workspace/softgym# apt-get install build-essential libgl1-mesa-dev freeglut3-dev libglfw3 libgles2-mesa-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
build-essential is already the newest version (12.4ubuntu1).
freeglut3-dev is already the newest version (2.8.1-3).
libglfw3 is already the newest version (3.2.1-1).
libglfw3 set to manually installed.
libgl1-mesa-dev is already the newest version (20.0.8-0ubuntu1~18.04.1).
libgles2-mesa-dev is already the newest version (20.0.8-0ubuntu1~18.04.1).
0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Xingyu-Lin commented 3 years ago

@DanielTakeshi, For approach 1, I have seen this segmentation fault a long time ago. One solution that worked for me was to re-install Nvidia driver version 440.64 (The sub-version also matters). Can you try that? Also, I am currently using cuda 10.2, although I don't think this matters.

Xingyu-Lin commented 3 years ago

The other issue with GLIBC makes sense to me: The docker image is built under ubuntu 18. If you compile inside a ubuntu 18 docker and try to use the .so object out side in a ubuntu 16 system, it will give an error. You can either try ubuntu 16 without the docker, or rebuild the docker image under ubuntu 16. For rebuilding the docker, you can find the docker recipe here and just change the first line to be ubuntu 16.

DanielTakeshi commented 3 years ago

Attempt 4: Ubuntu 16.04, install pyflex entirely outside of Docker

@Xingyu-Lin Thanks for the suggestions. Here I am trying to install without using Docker at all. This machine has a close driver version, version 440 (but a different subversion). It does not have the same exact sub-version that you suggest but unfortunately I need to first check in with other users first since it's a shared lab machine and I can't just unilaterally change the driver version (and was hoping not to since this introduces a very rigid requirement). Here are details of NVIDIA:

seita@triton3:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
seita@triton3:~$ nvidia-smi
Sun Feb 14 07:59:41 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:01:00.0 Off |                  N/A |
| 62%   86C    P2   103W / 250W |   9356MiB / 12188MiB |     87%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:02:00.0 Off |                  N/A |
| 51%   82C    P2   150W / 250W |   2238MiB / 12196MiB |     49%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1572      G   /usr/lib/xorg/Xorg                            49MiB |
|    0      4625      C   python                                      4501MiB |
|    0     13683      C   ...sup_rbt_train_quat.py -dataset 872objv2  4793MiB |
|    1     16940      C   ...sup_rbt_train_quat.py -dataset 872objv2  2225MiB |
+-----------------------------------------------------------------------------+

I follow the steps:

git clone https://github.com/Xingyu-Lin/softgym.git
cd softgym/
conda env create -f environment.yml
conda activate softgym
. ./prepare_1.0.sh
. ./compile_1.0.sh
cd ../../..

All the above seems to work (though prepare_1.0.sh script has un-necessary . activate softgym, since I activate beforehand, since it throws an error if I don't activate the env first). Then I get:

(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)

Same error. I will try again if I can get the exact NVIDIA driver version ready, so 440.33 should turn into 440.64? Also does it matter that this is done remotely through ssh connections?

yufeiwang63 commented 3 years ago

hmmm acutally I am using the driver of the exact version 440.33.01

(base) yufei@yufei-OMEN-by-HP-Laptop:~$ nvidia-smi
Mon Feb 15 00:20:22 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   48C    P8    N/A /  N/A |    925MiB /  2002MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1140      G   /usr/lib/xorg/Xorg                           341MiB |
|    0      2213      G   compiz                                        37MiB |
|    0      2285      G   fcitx-qimpanel                                 4MiB |
|    0      2883      G   ...quest-channel-token=2700618523479150890   222MiB |
|    0      3953      G   ...quest-channel-token=4260863179945314425    44MiB |
|    0      4805      G   ...quest-channel-token=1622710122323873953   264MiB |
+-----------------------------------------------------------------------------+

but I am using a different nvcc (version 9.2):

(base) yufei@yufei-OMEN-by-HP-Laptop:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

We ran into this SDL initialization problem a long time ago. I will try to remember more details on how we solved it. But perhaps you could give it a try to change the nvcc version?

DanielTakeshi commented 3 years ago

I see, thanks for checking. The machine in my last past does not have CUDA 9.2 and I will need to check in with other users to make sure I can install it. I did try downgrading to 9.0 (the other version available) and get the same error after the compilation steps:

(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name PassWater
Waiting to generate environment variations. May take 1 minute for each variation...
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
(softgym) seita@triton3:~/softgym$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
(softgym) seita@triton3:~/softgym$ 

So we know that for Ubuntu 16.04 with this nvidia driver, CUDA 9.0 and 10.0 are not working and maybe 9.2 will be once I can test it. :) This is without any docker compilation at all (since it's Ubuntu 16.04).

Xingyu-Lin commented 3 years ago

A few other suggestions based on how we debugged this issue:

  1. If you are running on cluster, you don't have a display environment. Did you set headless to true when running the random_env.py example?
  2. Run export LIBGL_DEBUG=verbose in the shell which can give you more information about GL during linking.
  3. Inspect the compiled GL to see what happened: nm -D pyflex.cpython-36m-x86_64-linux-gnu.so | grep SDL_GL to see if all symbols are normal.
  4. You can look at the C++ code by searching the error message Unable to initialize SDL and see what went wrong there

Also, you can install multiple CUDA version on ubuntu and set the environment variables to properly link to the CUDA version that you need. Hope that will help!

FranBesq commented 3 years ago

Earlier today installed SoftGym on a machine with Ubuntu 16.04 and CUDA 9.2 and NO Docker - It had CUDA 9.0 installed but got permission to purge it. Docker install with CUDA 9.0 apparently is not possible so had to upgrade. I guess the only contribution of this post is knowing the NO docker install also works with Driver Version: 396.37.

This is the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:01:00.0  On |                  N/A |
| 22%   34C    P8    15W / 250W |    205MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1021      G   /usr/lib/xorg/Xorg                           145MiB |
|    0      1695      G   compiz                                        56MiB |
+-----------------------------------------------------------------------------+

I upgraded manually to Driver Version: 430 but after installing CUDA 9.2 using .deb package I got downgraded to Driver Version: 396.37.

nvcc --version | grep "release" | awk '{print $6}' | cut -c2- outputs: 9.2.148

To add on some of your questions from your last post:

PD: About the installation itself:

You metion:

I follow the steps:


git clone https://github.com/Xingyu-Lin/softgym.git
cd softgym/
conda env create -f environment.yml
conda activate softgym
. ./prepare_1.0.sh
. ./compile_1.0.sh
cd ../../..

This were my steps from history:

 conda env create -f environment.yml
 conda activate softgym
 export PYFLEXROOT=${PWD}/PyFlex
 export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
 export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH
 . ./compile_1.0.sh
 cd ~/repos/softgym
 python examples/random_env.py --env_name PassWater

The only problem I had was with environment.yml not finding some packages, so I had to install them using pip.

Hope this helps.

DanielTakeshi commented 3 years ago

Attempt 5 (Success, on Ubuntu 18.04, compiling in Docker, run outside docker, CUDA 10.0)

So, apparently the headless option was all I needed on my Ubuntu 18.04 machine. I should have read through the random_env.py code carefully and tried that. To be clear, I followed the steps in my first post here which compiles on docker. Here are the machine specs:

(softgym) seita@mason:~/softgym (master) $ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
(softgym) seita@mason:~/softgym (master) $ nvidia-smi
Wed Feb 17 09:28:29 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:04:00.0 Off |                    0 |
| N/A   32C    P0    27W / 250W |     33MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:08:00.0 Off |                    0 |
| N/A   34C    P0    25W / 250W |     28MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  Off  | 00000000:09:00.0 Off |                    0 |
| N/A   37C    P0    28W / 250W |     28MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE...  Off  | 00000000:85:00.0 Off |                    0 |
| N/A   36C    P0    29W / 250W |     28MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-PCIE...  Off  | 00000000:89:00.0 Off |                    0 |
| N/A   33C    P0    25W / 250W |     28MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     17055      G   /usr/lib/xorg/Xorg                 28MiB |
|    1   N/A  N/A     17055      G   /usr/lib/xorg/Xorg                 23MiB |
|    2   N/A  N/A     17055      G   /usr/lib/xorg/Xorg                 23MiB |
|    3   N/A  N/A     17055      G   /usr/lib/xorg/Xorg                 23MiB |
|    4   N/A  N/A     17055      G   /usr/lib/xorg/Xorg                 23MiB |
+-----------------------------------------------------------------------------+

I did the compilation in docker, and the symbols look good when I check outside of docker:

(softgym) seita@mason:~/softgym/PyFlex/bindings/build (master) $ ls -lh 
total 13M
-rw-r--r-- 1 root root  24K Feb 17 09:21 CMakeCache.txt
drwxr-xr-x 5 root root 4.0K Feb 17 09:21 CMakeFiles
-rw-r--r-- 1 root root 1.5K Feb 17 09:21 cmake_install.cmake
-rw-r--r-- 1 root root  24K Feb 17 09:21 Makefile
-rwxr-xr-x 1 root root  13M Feb 17 09:21 pyflex.cpython-36m-x86_64-linux-gnu.so
(softgym) seita@mason:~/softgym/PyFlex/bindings/build (master) $ nm -D pyflex.cpython-36m-x86_64-linux-gnu.so | grep SDL_GL
00000000000c1be0 T SDL_GL_BindTexture
00000000000c28d0 T SDL_GL_CreateContext
00000000000c2950 T SDL_GL_DeleteContext
00000000000c28a0 T SDL_GL_ExtensionSupported
00000000000c28c0 T SDL_GL_GetAttribute
00000000000c2900 T SDL_GL_GetCurrentContext
00000000000c28f0 T SDL_GL_GetCurrentWindow
00000000000c2910 T SDL_GL_GetDrawableSize
00000000000c2880 T SDL_GL_GetProcAddress
00000000000c2930 T SDL_GL_GetSwapInterval
00000000000c2870 T SDL_GL_LoadLibrary
00000000000c28e0 T SDL_GL_MakeCurrent
00000000000c2980 T SDL_GL_ResetAttributes
00000000000c28b0 T SDL_GL_SetAttribute
00000000000c2920 T SDL_GL_SetSwapInterval
00000000000c2940 T SDL_GL_SwapWindow
00000000000c1bf0 T SDL_GL_UnbindTexture
00000000000c2890 T SDL_GL_UnloadLibrary

Then I activate the conda env, get the paths set up:

seita@mason:~/softgym/PyFlex/bindings/build (master) $ conda activate softgym
(softgym) seita@mason:~/softgym/PyFlex/bindings/build (master) $ cd ../../..
(softgym) seita@mason:~/softgym (master) $ export PYFLEXROOT=${PWD}/PyFlex
(softgym) seita@mason:~/softgym (master) $ export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
(softgym) seita@mason:~/softgym (master) $ export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

and run this (I have to make the data/ directory):

(softgym) seita@mason:~/softgym (master) $ mkdir data
(softgym) seita@mason:~/softgym (master) $ python examples/random_env.py --env_name ClothFlatten  --headless 1
Waiting to generate environment variations. May take 1 minute for each variation...
Compute Device: Tesla V100-PCIE-32GB

Pyflex init done!
config 0: camera params {'default_camera': {'pos': array([-0.  ,  0.82,  0.82]), 'angle': array([ 0.        , -0.78539816,  0.        ]), 'width': 720, 'height': 720}}, flatten area: 0.27312500000000006
MoviePy - Building file ./data/ClothFlatten.gif with imageio.
Video generated and save to ./data/ClothFlatten.gif                

Here is the GIF:

ClothFlatten

@Xingyu-Lin I think it may be worthy to mention the headless option in more detail in the README. Or maybe a link to this issue report...

Attempt 6: Ubuntu 16.04, Install entirely outside of docker, CUDA 9.0, driver 440.33

Now we don't use docker. Here is the machine info:

(softgym) seita@triton3:~/softgym$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

seita@triton3:~$ nvidia-smi
Sun Feb 14 07:59:41 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:01:00.0 Off |                  N/A |
| 62%   86C    P2   103W / 250W |   9356MiB / 12188MiB |     87%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:02:00.0 Off |                  N/A |
| 51%   82C    P2   150W / 250W |   2238MiB / 12196MiB |     49%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1572      G   /usr/lib/xorg/Xorg                            49MiB |
|    0      4625      C   python                                      4501MiB |
|    0     13683      C   ...sup_rbt_train_quat.py -dataset 872objv2  4793MiB |
|    1     16940      C   ...sup_rbt_train_quat.py -dataset 872objv2  2225MiB |
+-----------------------------------------------------------------------------+

All I had to do was literally this (steps from earlier):

git clone https://github.com/Xingyu-Lin/softgym.git
cd softgym/
conda env create -f environment.yml
conda activate softgym
. ./prepare_1.0.sh
. ./compile_1.0.sh
cd ../../..

Then make the data/ directory and run:

(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name ClothFold  --headless 1
Waiting to generate environment variations. May take 1 minute for each variation...
Compute Device: TITAN Xp

Pyflex init done!
config 0: {'default_camera': {'pos': array([-0.  ,  0.82,  0.82]), 'angle': array([ 0.        , -0.78539816,  0.        ]), 'width': 720, 'height': 720}}
MoviePy - Building file ./data/ClothFold.gif with imageio.
Video generated and save to ./data/ClothFold.gif

And here is the video:

ClothFold

DanielTakeshi commented 3 years ago

@Xingyu-Lin I'll update this post (to avoid cluttering the issue with more posts) with working vs non-working settings.

All Working Settings Thus Far

By "working" I mean it can run the example commands in headless mode. By the CUDA version, I mean the version from nvcc -V, NOT the nvidia-smi command. That latter command is used for the NVIDIA Driver version.

All Non-Working Settings Thus Far

(softgym) seita@triton3:~/softgym$ python examples/random_env.py --env_name ClothFold  --headless 1
Waiting to generate environment variations. May take 1 minute for each variation...
*** stack smashing detected ***: python terminated
Aborted (core dumped)

I then quickly changed my paths to point to CUDA 9.0 on that machine (instead of 10.0) and the code runs as expected. Interesting...

DanielTakeshi commented 3 years ago

@Xingyu-Lin I think we can probably close this ;) I am going to update https://danieltakeshi.github.io/2021/02/20/softgym/ with additional working setups.