Xingyu-Lin / softgym

SoftGym is a set of benchmark environments for deformable object manipulation.
BSD 3-Clause "New" or "Revised" License
270 stars 61 forks source link

Segmentation Fault (Core Dumped ) #5

Open DhananjayAshok opened 3 years ago

DhananjayAshok commented 3 years ago

Hi there,

When I try to run SoftGym, The PyFleX compilation works just fine, but when I run the line in the example python file that goes: env = normalize(SOFTGYM_ENVSargs.env_name)

I get the error: Unable to initialize SDLCould not initialize GL extensions Reshaping Segmentation fault (core dumped)

Do you have any idea what could be causing this?

System specifications: Ubuntu 18.04, CUDA 9.1

OpenGL applications work fine on my system, for example glxinfo and glxgears work as expected.

Xingyu-Lin commented 3 years ago

We have not tested the compilation steps for ubuntu 18. Are you using the docker?

liduanken commented 3 years ago

I have encountered the same problem. BTW I am using a docker.

DhananjayAshok commented 3 years ago

I am not using the docker, have used the other installation method. I am doing this whole process on a compute cluster and for admin reasons cannot use docker.

Xingyu-Lin commented 3 years ago

If you are using a cluster, then you probably do not have a display enviornment for GL applications. Can you try running softgym with the headless option on?

FranBesq commented 3 years ago

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11)

I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount
DhananjayAshok commented 3 years ago

So on the cluster I have been using xvfb and so the display environment should be working properly (because other GL applications like glxgears works as expected). However, I just ran it again headless and get a very similar error. I also attempted the solution provided by FranBesq and get the same issue.

Waiting to generate environment variations. May take 1 minute for each variation...
eglInitialize() failedeglChooseConfig() failedfailed to find suitable EGLConfigeglCreateContext() failedeglCreatePbufferSurface() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions
Segmentation fault (core dumped)
yufeiwang63 commented 3 years ago

Looks like this is an EGL error. Are you sure you have all correct EGL libraries installed on the cluster? maybe try this: apt-get install libglfw3 libgles2-mesa-dev

DhananjayAshok commented 3 years ago

Yup, these libraries are all installed.

liduanken commented 3 years ago

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11)

I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount

I still get: Could not initialize GL extensions CUDA 11 and Ubuntu 18.04 I wonder if you have some ideas, thank you

FranBesq commented 3 years ago

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11) I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount

I still get: Could not initialize GL extensions CUDA 11 and Ubuntu 18.04 I wonder if you have some ideas, thank you

Have you tried @yufeiwang63 answer? If it didn't work here are some things I would try. Although I don't want to send you on a wild goose chase.

liduanken commented 3 years ago

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11) I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount

I still get: Could not initialize GL extensions CUDA 11 and Ubuntu 18.04 I wonder if you have some ideas, thank you

Have you tried @yufeiwang63 answer? If it didn't work here are some things I would try. Although I don't want to send you on a wild goose chase.

* I described the steps I followed to install on my [fork](https://github.com/FranBesq/softgym/blob/master/docker/docker.md)

* @Xingyu-Lin links [this article](https://medium.com/@benjamin.botto/opengl-and-cuda-applications-in-docker-af0eece000f1) on his docker.md wich may be helpful

* The way I call random_env.py is through python interpreter directly, although you may have to do some minor changes on random_env.py.
  I dont see how this can be of any help with openGL, but it helped me with imports (conda messed up some env variables). Your installation may have some problem locating gl libraries, again, check the article above for additional help with this.
python
import examples.random_env as rand_env
rand_env.main()

Hello, Thanks for your answering. I have tried your solution and I successfully complied outside the container but it still did not work. I have also tried on Ubuntu 16.04 NVIDIA 440.33.01 and CUDA 10.2 (actually I suspect that whether the authors have successfully complied on CUDA 9.2 cause I have tried it before, but apparently some libraries do not match where you will get a error 'undefined symbol: cudaSetupArgument'. ), but still got 'Could not initialize GL extensions.' So I really do not know how to successfully compile authors' softgym as I have already tried at least 30 hours on it while nothing comes out. I wonder if you could provide some alternative ideas. Thanks for your answer!

yufeiwang63 commented 3 years ago

Hi LiDuanAtGlasgow, I feel sorry that the compilation brings so much trouble to you. We ourselves also spent lots of time getting the system running correctly at our early stage of development on this project. We were indeed able to compile the project with Nvidia driver 440.33.01, and cuda version 9.1 or 9.2. See the screenshot below. image

Also, can you check your $LD_LIBRARY_PATH to make sure it looks sth similar to mine?

DanielTakeshi commented 3 years ago

Hi @yufeiwang63 and @Xingyu-Lin , I am also running into the same problem that @LiDuanAtGlasgow has been running into. I am using Ubuntu 18.04 and the provided Docker.

I can produce a detailed issue report, but before doing that, I am interested in knowing the workflow that you two use to run softgym. Just to be clear, did you need to follow the instructions in this fork linked above? Is this the workflow that you generally follow? And when you run your python commands, are you using the usual command line shell or are you inside a docker environment?

FranBesq commented 3 years ago

The purpose of the container is to compile PyFlex as far as I understood. I followed similar steps to the PyFlex docker.md when creating the fork and got it to work this way. Again, Im not going to talk in behalf of the authors obviously. But I think is worth giving it a try.

Xingyu-Lin commented 3 years ago

We generally do not use the docker on our local desktop and only use it for launch experiments on computing clusters. On our local desktop, we follow the instructions here https://github.com/Xingyu-Lin/softgym/blob/master/README.md. The purpose of the docker was to make the compilation easier for more people. What @FranBesq said is correct: The docker is only used for compiling the Flex and PyFlex. Once the compilation is done, softgym can be run in a normal python environment.

Hi @yufeiwang63 and @Xingyu-Lin , I am also running into the same problem that @LiDuanAtGlasgow has been running into. I am using Ubuntu 18.04 and the provided Docker.

I can produce a detailed issue report, but before doing that, I am interested in knowing the workflow that you two use to run softgym. Just to be clear, did you need to follow the instructions in this fork linked above? Is this the workflow that you generally follow? And when you run your python commands, are you using the usual command line shell or are you inside a docker environment?

DanielTakeshi commented 3 years ago

Hi @Xingyu-Lin @FranBesq here is my more detailed minimum working example: https://github.com/Xingyu-Lin/softgym/issues/9

(In a separate issue report)

rehaanahmad2013 commented 3 years ago

Hey @Xingyu-Lin what do you have in your /usr/lib/nvidia-440 folder? I do not have a folder like that in /usr/lib, and I suspect that could be my issue.

Xingyu-Lin commented 3 years ago

Hi @rehaanahmad2013, here is my ls result:

alternate-install-present
alt_ld.so.conf
bin
ld.so.conf
libEGL_nvidia.so.0
libEGL_nvidia.so.440.64.00
libEGL.so
libEGL.so.1
libEGL.so.1.1.0
libEGL.so.440.64.00
libGLdispatch.so.0
libGLESv1_CM_nvidia.so.1
libGLESv1_CM_nvidia.so.440.64.00
libGLESv1_CM.so
libGLESv1_CM.so.1
libGLESv1_CM.so.1.2.0
libGLESv2_nvidia.so.2
libGLESv2_nvidia.so.440.64.00
libGLESv2.so
libGLESv2.so.2
libGLESv2.so.2.1.0
libGL.so
libGL.so.1
libGL.so.1.7.0
libGLX_indirect.so.0
libGLX_nvidia.so.0
libGLX_nvidia.so.440.64.00
libGLX.so
libGLX.so.0
libnvcuvid.so
libnvcuvid.so.1
libnvcuvid.so.440.64.00
libnvidia-allocator.so
libnvidia-allocator.so.1
libnvidia-allocator.so.440.64.00
libnvidia-cbl.so.440.64.00
libnvidia-cfg.so
libnvidia-cfg.so.1
libnvidia-cfg.so.440.64.00
libnvidia-compiler.so
libnvidia-compiler.so.1
libnvidia-compiler.so.440.64.00
libnvidia-eglcore.so.440.64.00
libnvidia-egl-wayland.so.1
libnvidia-egl-wayland.so.1.1.4
libnvidia-encode.so
libnvidia-encode.so.1
libnvidia-encode.so.440.64.00
libnvidia-fatbinaryloader.so.440.64.00
libnvidia-fbc.so
libnvidia-fbc.so.1
libnvidia-fbc.so.440.64.00
libnvidia-glcore.so.440.64.00
libnvidia-glsi.so.440.64.00
libnvidia-glvkspirv.so.440.64.00
libnvidia-ifr.so
libnvidia-ifr.so.1
libnvidia-ifr.so.440.64.00
libnvidia-ml.so
libnvidia-ml.so.1
libnvidia-ml.so.440.64.00
libnvidia-opticalflow.so
libnvidia-opticalflow.so.1
libnvidia-opticalflow.so.440.64.00
libnvidia-ptxjitcompiler.so
libnvidia-ptxjitcompiler.so.1
libnvidia-ptxjitcompiler.so.440.64.00
libnvidia-rtcore.so.440.64.00
libnvidia-tls.so.440.64.00
libnvoptix.so.1
libnvoptix.so.440.64.00
libOpenGL.so
libOpenGL.so.0
tls
vdpau
xorg
ShiguangSun commented 3 years ago

Hello, I got the same question in an ubuntu16.04 server. My cuda version is 9.2, and the nvidia driver version is 460.73.01. When I ran the random_env.py, if headless 0, it showed Could not initialize GL extensions Reshaping Segmentation fault (core dumped) and if headless 1, it showed eglGetDisplay() failedeglInitialize() failedeglChooseConfig() failedeglCreateContext() failedeglCreatePbufferSurface() failedeglMakeCurrent() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions Segmentation fault (core dumped) I tried the methods above, but can't work. I wonder whether the nvidia driver version affect?

Xingyu-Lin commented 3 years ago

If you are on a ubuntu server, it's very likely that you don't have a display environment. Does it work with headless set to 1?

ShiguangSun commented 3 years ago

No, I tried both, neither 1 nor 0, it didn't work.

Xingyu-Lin commented 3 years ago

Dirver version does make a difference. We got it working with Nvidia driver 440.33.01, and cuda version 9.1 or 9.2, although others also got it working under some other driver versions.

ShiguangSun commented 3 years ago

Ok,thanks, I 'll try.

ShiguangSun commented 3 years ago

Hi,when I ran . ./compile_1.0.sh, there were some warnings: /softgym/PyFlex/bindings/opengl/shadersGL.cpp:3386:25: warning: invalid conversion from ‘EGLConfig {aka void*}’ to ‘void’ [-fpermissive] g_eglConfig = configs[0] /PyFlex/bindings/opengl/shadersGL.cpp:3390:33: warning : invalid conversion from ‘EGLContext {aka void*}’ to ‘void*’ [-fpermissive] g_eglContext = eglCreateContext( ^ /PyFlex/bindings/opengl/shadersGL.cpp:3398:40: warning : invalid conversion from ‘EGLSurface {aka void}’ to ‘void’ [-fpermissive] g_eglSurface = eglCreatePbufferSurface(g_eglDisplay, g_eglConfig, Is this the reason why I couldn't run softgym?

ShiguangSun commented 3 years ago

So on the cluster I have been using xvfb and so the display environment should be working properly (because other GL applications like glxgears works as expected). However, I just ran it again headless and get a very similar error. I also attempted the solution provided by FranBesq and get the same issue.

因此,在集群上我一直在使用 xvfb,因此显示环境应该能够正常工作(因为其他 GL 应用程序如 glxgears 可以正常工作)。然而,我只是运行它再次无头,并得到一个非常相似的错误。我还尝试了 FranBesq 提供的解决方案,得到了同样的问题。

Waiting to generate environment variations. May take 1 minute for each variation...
eglInitialize() failedeglChooseConfig() failedfailed to find suitable EGLConfigeglCreateContext() failedeglCreatePbufferSurface() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions
Segmentation fault (core dumped)

Hi, have you solved this problem?

rehaanahmad2013 commented 3 years ago

So on the cluster I have been using xvfb and so the display environment should be working properly (because other GL applications like glxgears works as expected). However, I just ran it again headless and get a very similar error. I also attempted the solution provided by FranBesq and get the same issue.

Waiting to generate environment variations. May take 1 minute for each variation...
eglInitialize() failedeglChooseConfig() failedfailed to find suitable EGLConfigeglCreateContext() failedeglCreatePbufferSurface() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions
Segmentation fault (core dumped)

@DhananjayAshok Have you been able to solve this problem? Others have had the segmentation fault but I'm also experiencing the exact same output error with "eglInitialize()..." etc. I also used a non-docker approach for admin reasons.

TriBall3 commented 2 years ago

Have you solved the problem yet?

TriBall3 commented 2 years ago

I have encountered the same problem. BTW I am using a docker.

Have you solved the problem yet?

zcswdt commented 1 year ago

lib/nvidia-440文件夹中有什么?我在/usr/lib中没有这样的文件夹,我怀疑这可能是我的问题 Hello, I also encountered this issue. I entered echo $LD LIBRARY The PATH display is as follows image

zcswdt commented 1 year ago

/compile_1.0.sh,有一些警告

你好,我在ubuntu16.04服务器上遇到了同样的问题。我的cuda版本是9.2,nvidia驱动版本是460.73.01。当我运行random_env.py时,如果headless 0,它显示 Could not initialize GL extensions Reshaping Segmentation fault (core dumped) ,如果headless 1,它显示 eglGetDisplay() failedglInitialize() failedglChooseConfig() failedglCreateContext() failedglCreatePbufferSurface() failedglMakeCurrent( ) failedglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions Segmentation fault (core dumped) 我试过上面的方法,但是不行。 请问nvidia驱动版有影响吗?

Hello, I have encountered the same problem as you. Have you resolved it?

bilkitty commented 7 months ago

Late to the party, but if anyone is stuck on this, you can run the examples in headless mode. e.g., python examples/random_env.py --headless 1 --env_name PassWater

This worked in the prebuild Docker image which I setup in Ubuntu 20.04. The build script uses CUDA 9.2.

karinoon commented 6 months ago

Hi,when I ran . ./compile_1.0.sh, there were some warnings: /softgym/PyFlex/bindings/opengl/shadersGL.cpp:3386:25: warning: invalid conversion from ‘EGLConfig {aka void*}’ to ‘void’ [-fpermissive] g_eglConfig = configs[0] /PyFlex/bindings/opengl/shadersGL.cpp:3390:33: warning : invalid conversion from ‘EGLContext {aka void*}’ to ‘void*’ [-fpermissive] g_eglContext = eglCreateContext( ^ /PyFlex/bindings/opengl/shadersGL.cpp:3398:40: warning : invalid conversion from ‘EGLSurface {aka void}’ to ‘void’ [-fpermissive] g_eglSurface = eglCreatePbufferSurface(g_eglDisplay, g_eglConfig, Is this the reason why I couldn't run softgym?

Hi, have you solved this problem?