autowarefoundation / autoware

Autoware - the world's leading open-source software project for autonomous driving
https://www.autoware.org/
Apache License 2.0
8.6k stars 2.89k forks source link

`[ros2run]: Segmentation fault` on all Qt based application with latest `humble-latest-cuda` image #4092

Closed VRichardJP closed 6 months ago

VRichardJP commented 6 months ago

Checklist

Description

I just rebuilt autoware docker image with the latest codebase (bf95c380db6debdf07fb9b6854036df567e98903):

$ ./docker/build.sh --no-prebuilt

Since then I have not been able to run any Qt based application (rviz2, rqt, turtlesim...) from a docker container. The application always crashes with a segfault:

$ xhost +local:docker
$ docker run --runtime nvidia -e DISPLAY -v ~/.Xauthority:/root/.Xauthority -v /tmp/.X11-unix:/tmp/.X11-unix --rm -it ghcr.io/autowarefoundation/autoware-universe:humble-latest-cuda /bin/bash
root@8708db16fe0f:/autoware# ros2 run rviz2 rviz2
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
[ros2run]: Segmentation fault

Other GUI application (e.g. xterm) have no issue.

When I run the programs with gdb, I see this:

root@590fae079b65:/autoware# sudo apt update && sudo apt install gdb
[...]
root@590fae079b65:/autoware# ros2 run --prefix="gdb -ex=r --args" rviz2 rviz2
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/ros/humble/lib/rviz2/rviz2...
(No debugging symbols found in /opt/ros/humble/lib/rviz2/rviz2)
Starting program: /opt/ros/humble/lib/rviz2/rviz2 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fac3c6d0640 (LWP 713)]
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
[New Thread 0x7fac3bebb640 (LWP 714)]
[New Thread 0x7fac3b6ba640 (LWP 715)]
[New Thread 0x7fac3aeb9640 (LWP 716)]
[New Thread 0x7fac3a6b8640 (LWP 717)]
[New Thread 0x7fac39eb7640 (LWP 718)]
[New Thread 0x7fac395b5640 (LWP 719)]
[New Thread 0x7fac38cb3640 (LWP 720)]
[New Thread 0x7fac1bfff640 (LWP 721)]
[New Thread 0x7fac1b7fe640 (LWP 722)]
[New Thread 0x7fac1affd640 (LWP 723)]
[Detaching after fork from child process 724]

Thread 1 "rviz2" received signal SIGSEGV, Segmentation fault.
0x000055d0754bf270 in ?? ()
(gdb) bt
#0  0x000055d0754bf270 in ?? ()
#1  0x00007fac18dba3e3 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#2  0x00007fac1875e26f in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#3  0x00007fac1875537f in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#4  0x00007fac1875e886 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#5  0x00007fac3805510b in ?? () from /lib/x86_64-linux-gnu/libGLX_indirect.so.0
#6  0x00007fac3805a0db in ?? () from /lib/x86_64-linux-gnu/libGLX_indirect.so.0
#7  0x00007fac3805b157 in ?? () from /lib/x86_64-linux-gnu/libGLX_indirect.so.0
#8  0x00007fac38057cdc in ?? () from /lib/x86_64-linux-gnu/libGLX_indirect.so.0
#9  0x00007fac382697d3 in ?? ()
   from /usr/lib/x86_64-linux-gnu/qt5/plugins/xcbglintegrations/libqxcb-glx-integration.so
#10 0x00007fac3cc46025 in QXcbWindow::create() () from /lib/x86_64-linux-gnu/libQt5XcbQpa.so.5
#11 0x00007fac3cc32636 in QXcbIntegration::createPlatformWindow(QWindow*) const ()
   from /lib/x86_64-linux-gnu/libQt5XcbQpa.so.5
#12 0x00007fac419abb21 in QWindowPrivate::create(bool, unsigned long long) ()
   from /lib/x86_64-linux-gnu/libQt5Gui.so.5
#13 0x00007fac4314b4f5 in QWidgetPrivate::create() () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#14 0x00007fac4314bb1e in QWidget::create(unsigned long long, bool, bool) ()
   from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#15 0x00007fac43158ebe in QWidgetPrivate::setVisible(bool) () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#16 0x00007fac437aec99 in rviz_common::VisualizationFrame::initialize(std::weak_ptr<rviz_common::ros_integration::RosNodeAbstractionIface>, QString const&) () from /opt/ros/humble/lib/librviz_common.so
#17 0x00007fac437befae in rviz_common::VisualizerApp::init(int, char**) ()
   from /opt/ros/humble/lib/librviz_common.so
#18 0x000055d073eed8e5 in ?? ()
#19 0x00007fac423efd90 in __libc_start_call_main (main=main@entry=0x55d073eed430, argc=argc@entry=1, 
    argv=argv@entry=0x7ffc5b753bf8) at ../sysdeps/nptl/libc_start_call_main.h:58
#20 0x00007fac423efe40 in __libc_start_main_impl (main=0x55d073eed430, argc=1, argv=0x7ffc5b753bf8, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc5b753be8)
    at ../csu/libc-start.c:392
#21 0x000055d073eedc15 in ?? ()
(gdb) 

Somehow, there is no problem when I use rocker. For example:

rocker --nvidia --x11 --user --volume $HOME/autoware -- ghcr.io/autowarefoundation/autoware-universe:humble-latest-cuda

But my setup is based on vscode devcontainer and docker-compose files, so I would rather to fix the "normal" docker way.

Expected behavior

No crash.

Actual behavior

Qt based apps crash.

Steps to reproduce

$ xhost +local:docker
$ docker run --runtime nvidia -e DISPLAY -v ~/.Xauthority:/root/.Xauthority -v /tmp/.X11-unix:/tmp/.X11-unix --rm -it ghcr.io/autowarefoundation/autoware-universe:humble-latest-cuda /bin/bash

Versions

Possible causes

No response

Additional context

No response

VRichardJP commented 6 months ago

Missed the nvidia environment variables NVIDIA_DRIVER_CAPABILITIES (and NVIDIA_VISIBLE_DEVICES?). With the 2 set to all, rviz and other applications now work.

That is curious though, because I think I never set these variables before and my setup was working just fine. Anyway, it works now.