facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.64k stars 424 forks source link

Runtime Segmentation fault (core dumped) #182

Closed SenZHANG-GitHub closed 5 years ago

SenZHANG-GitHub commented 5 years ago

Steps to reproduce

  1. CentOS release 6.9 (Linux version 2.6.32-696.16.1.el6.x86_64 (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) )
  2. Complier: gcc 7.2.0
  3. Compilation status: success
  4. Python version: 3.6.5

Observed Results

"python src/habitat-sim/examples/example.py --scene ~/data/habitat/scene_datasets/habitat-test-scenes/skokloster-castle.glb" leads to:

image

Possible Reasons

  1. Incorrect EGL version? I locally installed the following two packages by "rpm2cpio XX | cpio -idv" : mesa-libEGL-11.0.7-4.el6.x86_64.rpm & mesa-libEGL-devel-11.0.7-4.el6.x86_64.rpm. Then I directly replace the header files by EGL 1.5 to include necessary DEFINEs as suggested in https://github.com/facebookresearch/habitat-sim/issues/145#issuecomment-521318477

Is the version of EGL .so file enough for habitat? (v11.0.7 for centos 6)

"ldd libEGL.so.1.0.0" gives: linux-vdso.so.1 => (0x00007ffd3c318000) /usr/local/gcc/7.2.0/lib64/libstdc++.so.6 (0x00002b3c37694000) /usr/local/gcc/7.2.0/lib64/libgcc_s.so.1 (0x00002b3c37a2b000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00002b3c37c56000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b3c37e75000) libX11-xcb.so.1 => /usr/lib64/libX11-xcb.so.1 (0x00002b3c38093000) libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002b3c38294000) libxcb-dri2.so.0 => /usr/lib64/libxcb-dri2.so.0 (0x00002b3c385d1000) libxcb-xfixes.so.0 => /usr/lib64/libxcb-xfixes.so.0 (0x00002b3c387d6000) libxcb-render.so.0 => /usr/lib64/libxcb-render.so.0 (0x00002b3c389dd000) libxcb-shape.so.0 => /usr/lib64/libxcb-shape.so.0 (0x00002b3c38be9000) libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00002b3c38ded000) libgbm.so.1 => /usr/lib64/libgbm.so.1 (0x00002b3c39012000) libm.so.6 => /lib64/libm.so.6 (0x00002b3c3921d000) libdl.so.2 => /lib64/libdl.so.2 (0x00002b3c394a2000) libdrm.so.2 => /usr/lib64/libdrm.so.2 (0x00002b3c396a6000) libexpat.so.1 => /lib64/libexpat.so.1 (0x00002b3c398b3000) libc.so.6 => /lib64/libc.so.6 (0x00002b3c39adc000) /lib64/ld-linux-x86-64.so.2 (0x0000003af4a00000) libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002b3c39e70000) librt.so.1 => /lib64/librt.so.1 (0x00002b3c3a074000)

  1. Incorrect CUDA or GPU version? I loaded cuda 7.5 / 9.2 / 10.0 with Tesla V100 but still not working

  2. Other missing or incorrect dependencies?

erikwijmans commented 5 years ago

CUDA 9.2 and 10.0 on a V100 should work. In order for EGL to work correctly, habitat-sim needs to pick up Nvidia's version of EGL at runtime. If it doesn't by default, it is likely that your system has a non-standard Nvidia driver install and you need to add the path to Nvidia's libEGL to LD_LIBRARY_PATH.

SenZHANG-GitHub commented 5 years ago

CUDA 9.2 and 10.0 on a V100 should work. In order for EGL to work correctly, habitat-sim needs to pick up Nvidia's version of EGL at runtime. If it doesn't by default, it is likely that your system has a non-standard Nvidia driver install and you need to add the path to Nvidia's libEGL to LD_LIBRARY_PATH.

Hit the point! Thanks. Actually I noticed this answer https://github.com/facebookresearch/habitat-sim/issues/18#issuecomment-483930596 which should also be the right answer here. But I misunderstood and thought libglvnd is one of EGL dependencies, trying to fix this by updating EGL to the latest mesa version and installing libglvnd. Now figure out that mesa-egl and libglvnd-egl are two distributions. "python example/example.py --scene ...." seems working now:

image

To those who might be interested, the final solution is straight-forward (on CentOS6 without root):

(1) Locally install (rpm2cpio): libglvnd-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm ibglvnd-core-devel-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm libglvnd-devel-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm libglvnd-egl-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm libglvnd-gles-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm libglvnd-glx-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm libglvnd-opengl-1.0.1-0.1.git5baa1e5.el6.x86_64.rpm (also install mesa version to get necessary header files for compilation) mesa-libEGL-18.0.5-4.el7_6.x86_64.rpm mesa-libEGL-devel-18.0.5-4.el7_6.x86_64.rpm mesa-libGL-18.0.5-4.el7_6.x86_64.rpm mesa-libGL-devel-18.0.5-4.el7_6.x86_64.rpm

(2) Add "-DEGL_LIBRARY=/path/to/libglvnd/libEGL.so" and "-DEGL_INCLUDE_DIR=/path/to/include" to setup.py

(3) Modify relevant system variables (gcc version should be >= 7.1.0) image

WARNING: playing with LD_PRELOAD in ~/.bashrc could be dangerous (unexpected missing dependencies can cause you some troubles). DO export necessary LD_LIBRARY_PATH at the same time and leave some bashes running either on tmux or other terminals to remedy the system whenever needed.

(4) "python setup.py install --headless"

erikwijmans commented 5 years ago

Just for future reference,

(2) Add "-DEGL_LIBRARY=/path/to/libglvnd/libEGL.so" and "-DEGL_INCLUDE_DIR=/path/to/include" to setup.py

is likely unneeded as using whatever EGL gets picked up at compile time is totally fine, its just at run-time that the correct EGL needs to be picked up.

SenZHANG-GitHub commented 5 years ago

Just for future reference,

(2) Add "-DEGL_LIBRARY=/path/to/libglvnd/libEGL.so" and "-DEGL_INCLUDE_DIR=/path/to/include" to setup.py

is likely unneeded as using whatever EGL gets picked up at compile time is totally fine, its just at run-time that the correct EGL needs to be picked up.

Yes the version of libEGL.so does not matter for compilation. Adding the path of whatever libEGL.so to LD_LIBRARY_PATH should also work (I add this because the server I use does not have system-wide EGL installed and I have to install and link to it locally). But the EGL header files should be up-to-date to include some necessary variables for compilation.

erikwijmans commented 5 years ago

Ah yes, they should be :)