carla-simulator / carla

Open-source simulator for autonomous driving research.
http://carla.org
MIT License
11.17k stars 3.61k forks source link

RenderOffScreen with Carla Docker Image - Vulkan Driver Problems on Ubuntu 22.04 with NVIDIA GPU #8079

Open Mariusmarten opened 1 month ago

Mariusmarten commented 1 month ago

After running the command: ./CarlaUE4.sh -RenderOffScreen or executing the binary directly via /home/carla/CarlaUE4/Binaries/Linux/CarlaUE4-Linux-Shipping CarlaUE4 -prefernvidia -RenderOffScreen Carla exited without any errors.

image

CARLA version: tested using docker version carlasim/carla:0.9.14 and carlasim/carla:0.9.15

Platform/OS: docker (running Ubuntu 18 within the container and 22 on the host system)

Problem: Carla is not starting up. Starting up works on other nodes and when using the -nullrhi flag for repressing GPU support).

Expectation: Expected that Carla will start up normally.

Steps to reproduce: I followed the steps described here: https://carla.readthedocs.io/en/latest/build_docker/. I pulled an image, built the image, and ran it.

Other info: The problems might be due to a recent migration from Ubuntu 18 to 22 on the host system, before the container was working fine. I, however, have no influence over this versioning, so I need to work with Ubuntu 22. The CUDA version of the host machine is Driver Version: 550.90.07 and CUDA Version: 12.4. vulkaninfo gives an error but nvidia-smi works without problems.

image

Questions: How can I figure out what is wrong with the GPU? Since the version is 9.12+, does Vulkan still need to be accessible from within the container? What could be the problem here?

Potentially related: CarlaUE4.sh cannot launch and https://github.com/carla-simulator/carla/issues/7324

EDIT: this issue seems to be related but the fix did not work for me https://github.com/carla-simulator/carla/issues/6234

BBArikL commented 1 month ago

Try creating a /home directory inside the docker container. I think I had the same issue and doing that fixed it for me.

Mariusmarten commented 1 month ago

@BBArikL, thank you for the quick response. Did you maybe use a different container setup? In the default Carla containers, the home directory is reserved for the CarlaUE4, containing, e.g., the binaries /home/carla/CarlaUE4/Binaries/Linux, so they are created during the build.

BBArikL commented 1 month ago

My bad, you are right. Even if the xdg-user-dirs failure it is a red herring (#3514 , #4193), it does help to install it to get that issue out of the way to properly diagnose why it isn't working. But yes, Carla switched from opengl to vulkan in the latest versions. Personally, I used a NVIDIA RTX3070 not long ago with the same docker build, and had to go trough some hoops to get it working properly, but I did not have to mess with vulkan drivers. Also what command are you using to start the docker container? The exact one that is provided in the docs?

Mariusmarten commented 1 month ago

Yes, the xdg-user-dirs is not the problem here.

Since I am using 9.14 (or 9.15) I thought no additional steps are needed:

Starting from version 0.9.12, CARLA runs on Unreal Engine 4.26 which introduced support for off-screen rendering. In previous versions of CARLA, off-screen rendering depended upon the graphics API you were using.

I execute the CarlaUE4-Linux-Shipping binary directly instead of opening it via the CarlaUE4.sh. When using this command: /home/carla/CarlaUE4/Binaries/Linux/CarlaUE4-Linux-Shipping CarlaUE4 -nullrhi -prefernvidia -RenderOffScreen Carla runs and keeps running 'normally' but without any graphics. So without graphics enabled the process looks good: image

The problem thus seems to be related to the Vulkan drivers not being properly mounted into the container.

Mariusmarten commented 1 month ago

The problem is that in my case vulkan failed because the libnvidia-gpucomp.so.550.90.07 file was not mounted into the container. This should potentially be added to the run command. Here is my full solution:

Pull latest docker version (an earlier version likely also works but I have not verified this): docker pull carlasim/carla:0.9.15

Run the following command. It is important to mount both the .X11-unix folder and the specific x86_64-linux-gnu such that the carla container. Mounting x86_64-linux-gnu explicitly is what fixed my problem.

docker run --name carla \
  --gpus '"device=1" \
  -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
  -v /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.550.90.07:/usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.550.90.07 \
  -v /usr/share/vulkan/icd.d:/usr/share/vulkan/icd.d \
  carlasim/carla:0.9.15 \
  sleep infinity

Enter the container as root and install vulkan-utils to run vulkaninfo --summary to verify that everything works as expected. Important: vulkaninfo --summary should not yield any errors on the host machine, if it does your problem lies elsewhere. Here we verify that vulkan can properly be used from within the container: docker exec -u root -it carla /bin/bash

apt install vulkan-utils

vulkaninfo --summary

Leave the carla container again and enter again as non-root user. exit

docker exec -it carla /bin/bash

Make the carla binary executable and run it. CarlaUE4 should now be running. You can verify this by checking watch -n 0.01 nvidia-smi, the gpu should be taxed now. chmod +x "/home/carla/CarlaUE4/Binaries/Linux/CarlaUE4-Linux-Shipping"

/home/carla/CarlaUE4/Binaries/Linux/CarlaUE4-Linux-Shipping CarlaUE4 -carla-server -RenderOffScreen