carla-simulator / carla

Open-source simulator for autonomous driving research.
http://carla.org
MIT License
10.81k stars 3.48k forks source link

Launching CARLA Simulator in Singularity #4256

Open fireofearth opened 3 years ago

fireofearth commented 3 years ago

I'm trying to run CARLA Simulator in a HPC (in Compute Canada to be exact. Specifications if needed: https://docs.computecanada.ca/wiki/Cedar) (headless server). Docker is not available, so I'm using Singularity.

CARLA exits immediately with very little indication of where the error came from. Has anyone successfully launched CARLA Simulator as a Singularity image at all? If so, how did you do it?

I've seen lot's of Github issues about running CARLA in Docker, but information seems sparse regarding using Singularity.

Program versions

Singularity version: Singularity 3.7 CARLA image: carlasim/carla:0.9.11 (from https://hub.docker.com/r/carlasim/carla/tags) nvidia-smi output:

nvidia-smi
Thu May 27 12:58:10 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:04:00.0 Off |                    0 |
| N/A   30C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:04:00.0

What I've tried

First I build the SIF file:

singularity build carla-0.9.11.sif docker://carlasim/carla:0.9.11

I've tried running CARLA from an interactive shell:

singularity shell --nv carla-0.9.11.sif
singularity> nvidia-smi # I've checked that GPU is visible
singularity> cd /home/carla
singularity> ./CarlaUE4.sh

I've also tried singularity> ./CarlaUE4.sh -opengl. Both give:

chmod: changing permissions of '/home/carla/CarlaUE4/Binaries/Linux/CarlaUE4-Linux-Shipping': Read-only file system
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
sh: 1: xdg-user-dir: not found

And then CARLA exits with code 1. I've also tried (with/out -opengl flag):

singularity exec --nv carla-0.9.11.sif /bin/bash /home/carla/CarlaUE4.sh

which gives the same output and exit code 1. I've also played around with the suggestions in https://github.com/carla-simulator/carla/issues/1290 which unfortunately didn't help much. I have yet to test running on different versions of CARLA. Will post updates if I find anything.

qhaas commented 3 years ago

We are using Singularity to run CARLA server on a TOP500 supercomputer at Oak Ridge National Lab.

Know that the chmod warning and xdg-user-dir warning can be red herrings.

Since you are on an HPC, you will want it to be xorg-less / headless. You were correct in trying opengl, since vulkan doesn't support xorg-less / headless without an upgrade to UE 4.25+.

Here is how it can be ran headless with your docker image of choice. I'm showing a stand-alone x86-64 system for this example since Summit is a ppc64le rabbit hole. Note that I include nvidia-container-cli here since I didn't want to chase down paths to dependencies and add them to singularity configuration files as part of singularity's dependency discovery process:

$ cat /etc/redhat-release 
CentOS Linux release 8.3.2011
$ nvidia-smi | grep Version | awk '{print $6}'
465.19.01
$ nvidia-container-cli --version | head -1
version: 1.4.0
$ nvidia-smi -L | head -1
GPU 0: NVIDIA GeForce GTX TITAN Black
...
$ singularity --version
singularity version 3.7.3-1.el8
$ uname -m
x86_64
$ singularity build carla-0.9.11.sif docker://carlasim/carla:0.9.11
...
$ SINGULARITYENV_SDL_VIDEODRIVER=offscreen singularity exec --nv -e carla-0.9.11.sif /home/carla/CarlaUE4.sh -opengl
...

Give it about 20 seconds to startup and ignore any ALSA warnings since sound is irrelevant.

From a separate shell on the same system, you can verify port 2000 is listening and carla is using the GPU:

$ lsof -nP -iTCP -sTCP:LISTEN | grep CarlaUE4
CarlaUE4- ... TCP *:2000 (LISTEN)
CarlaUE4- ... TCP *:2001 (LISTEN)
$ nvidia-smi | grep Carla
... C+G   ...x/CarlaUE4-Linux-Shipping...

Once you have that working, consider leveraging singularity instance for headless HPC container needs.

fireofearth commented 3 years ago

Thanks for the suggestion. Unfortunately trying this didn't work:

SINGULARITYENV_SDL_VIDEODRIVER=offscreen singularity exec \
    --nv -B /localscratch:/tmp \
    -B $(pwd)/CarlaUE4/Saved:/home/carla/CarlaUE4/Saved,$(pwd)/Engine/Saved:/home/carla/Engine/Saved \
    -e carla-0.9.11.sif /home/carla/CarlaUE4.sh -opengl

I don't have nvidia-container-cli, but the IT staff I wrote to said the --nv setting should work as is. I think the challenge here is the lack of an error message / transparency from the CARLA software.

hh0rva1h commented 3 years ago

Exactly the same problem as OP here. @fireofearth Any update so far?

qhaas commented 3 years ago

Note the following warning when running nvidia-smi

WARNING: infoROM is corrupted at gpu 0000:04:00.0

That doesn't bode well for the firmware on the GPU based on this issue:

Corrupted means the inforom did not pass some sort of sanity check (e.g. checksum). Therefore the GPU driver won't use or trust its contents.

Regarding --nv, that 'should' be enough to get Vulkan, CUDA, and OpenGL working, assuming the singularity configuration file is 'mounting' all the dependencies from the host's driver properly since nvidia moves dependencies around between the cuda framework and driver depending on the version. A word of warning about the following:

the IT staff I wrote to said the --nv setting should work as is

In my experience, when HPC IT Admins say that, what they mean is compute (i.e. CUDA) will work, OpenGL rendering can be a different beast and isn't often considered on 'big iron' systems, at least in my experience.

adscib commented 2 years ago

We've had CARLA running on Cedar at some point, but unfortunately after some SLURM update it stopped. After a lot of digging, we have come to suspect that the issue is with how Singularity binds different GPUs, which seems to have been reworked in a recent PR. It looks like this is going to be released in Singularity 3.9, so maybe when that comes out we'll be able to run CARLA on Cedar again. In the meantime, we were able to run CARLA in Singularity on HPC clusters that use PBS instead of SLURM.

pranavAL commented 2 years ago

Is the problem solved? I am also trying to set up CARLA on compute Canada cluster (Cedar), but facing the same issue.

reiverjohn commented 2 years ago

@qhaas This was really helpful. I am setting this up for a user on our HPC and I can run Carla headless as described. But I have a rather naive question (as I haven't used this software and would like to test). Once Carla is running on the headless node, how does one go about to connecting a graphical interface to Port 2000? I have tried port forwarding via ssh and connecting via a browser - that definitely does not seem to work. For example, after getting Carla running, open a separate local shell and use:

ssh -N -L2000:carlanode:2000 username@hpc-login.some.edu

Then try to connect in the browser to http://localhost:2000 nmap from the login node shows that port 2000 is open. I've used this technique for Jupyter notebooks successfully.

I assume there is some other way?

Hither1 commented 2 years ago

/home/carla/CarlaUE4.sh Hi, It seems that this command assumes Carla is already installed in the local machine?

RaviBeagle commented 1 year ago

As I understand it is possible to run latest CARLA in Singularity with using -carla-server && - RenderOffScreen without any problems ?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

qhaas commented 1 year ago

Since OpenGL is no more in recent versions of UnrealEngine, and thus CARLA, time for an update...

We build just like always: singularity build carla-0914.sif docker://carlasim/carla:0.9.14

To run headless / CLI only: singularity run --nv -e carla-0914.sif /home/carla/CarlaUE4.sh -RenderOffScreen -nosound

abol-karimi commented 4 months ago

I tried singularity pull docker://carlasim/carla:0.9.15 and singularity run --nv -e carla_0.9.15.sif /home/carla/CarlaUE4.sh -RenderOffScreen -nosound but the script exits with code 1:

INFO:    Setting 'NVIDIA_VISIBLE_DEVICES=all' to emulate legacy GPU binding.
INFO:    Setting --writable-tmpfs (required by nvidia-container-cli)
chmod: changing permissions of '/home/carla/CarlaUE4/Binaries/Linux/CarlaUE4-Linux-Shipping': Operation not permitted
4.26.2-0+++UE4+Release-4.26 522 0
Disabling core dumps.
sh: 1: xdg-user-dir: not found

No luck with Carla 0.9.14 either.

qhaas commented 4 months ago

sh: 1: xdg-user-dir: not found is a harmless red herring. If singularity doesn't return and appears to hang, then it is possible it is running in headless / cli mode and listening for connections from the carla client.

The singularity run --nv -e carla_0.9.15.sif /home/carla/CarlaUE4.sh -RenderOffScreen -nosound will:

  1. Enable nVidia GPU support (--nv), a different argument is needed to enable Intel / AMD GPUs
  2. Use the container's environment variables instead of the hosts (-e)
  3. Start CARLA without a GUI and with no sound (-RenderOffScreen -nosound)

If you wish to run CARLA with a GUI on an nVidia GPU, the command is (note the lack of -e): singularity exec --nv carla_0.9.15.sif /home/carla/CarlaUE4.sh

This assumes singularity / apptainer is configured properly with your GPU, which can be verified (for nvidia) with singularity run --nv -e carla_0.9.15.sif nvidia-smi -L

abol-karimi commented 4 months ago

The problem is that singularity immediately returns. The output of singularity run --nv -e carla_0.9.15.sif nvidia-smi -L is:

GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-978777c2-22b9-c0a6-65c2-6164938beca6)
GPU 1: NVIDIA A100-PCIE-40GB (UUID: GPU-1618e4dd-e246-7a0f-2873-43a694f27743)
  MIG 2g.10gb     Device  0: (UUID: MIG-444a2664-9fae-59e9-a85f-4fe324bd65a1)
GPU 2: NVIDIA A100-PCIE-40GB (UUID: GPU-9b367d38-2748-b606-ca02-04e8a765bdae)

I also tried with the --net --network=none to make sure the loopback network is available inside the container, but still no luck.

kanavalau commented 4 weeks ago

@abol-karimi did you find a solution? I seem to have the exact same problem

abol-karimi commented 3 weeks ago

My problem was due to Apptainer version and Vulkan: https://github.com/carla-simulator/carla/issues/6374#issuecomment-2159170451