Open fireofearth opened 3 years ago
We are using Singularity to run CARLA server on a TOP500 supercomputer at Oak Ridge National Lab.
Know that the chmod warning and xdg-user-dir warning can be red herrings.
Since you are on an HPC, you will want it to be xorg-less / headless. You were correct in trying opengl, since vulkan doesn't support xorg-less / headless without an upgrade to UE 4.25+.
Here is how it can be ran headless with your docker image of choice. I'm showing a stand-alone x86-64 system for this example since Summit is a ppc64le rabbit hole. Note that I include nvidia-container-cli
here since I didn't want to chase down paths to dependencies and add them to singularity configuration files as part of singularity's dependency discovery process:
$ cat /etc/redhat-release
CentOS Linux release 8.3.2011
$ nvidia-smi | grep Version | awk '{print $6}'
465.19.01
$ nvidia-container-cli --version | head -1
version: 1.4.0
$ nvidia-smi -L | head -1
GPU 0: NVIDIA GeForce GTX TITAN Black
...
$ singularity --version
singularity version 3.7.3-1.el8
$ uname -m
x86_64
$ singularity build carla-0.9.11.sif docker://carlasim/carla:0.9.11
...
$ SINGULARITYENV_SDL_VIDEODRIVER=offscreen singularity exec --nv -e carla-0.9.11.sif /home/carla/CarlaUE4.sh -opengl
...
Give it about 20 seconds to startup and ignore any ALSA
warnings since sound is irrelevant.
From a separate shell on the same system, you can verify port 2000 is listening and carla is using the GPU:
$ lsof -nP -iTCP -sTCP:LISTEN | grep CarlaUE4
CarlaUE4- ... TCP *:2000 (LISTEN)
CarlaUE4- ... TCP *:2001 (LISTEN)
$ nvidia-smi | grep Carla
... C+G ...x/CarlaUE4-Linux-Shipping...
Once you have that working, consider leveraging singularity instance for headless HPC container needs.
Thanks for the suggestion. Unfortunately trying this didn't work:
SINGULARITYENV_SDL_VIDEODRIVER=offscreen singularity exec \
--nv -B /localscratch:/tmp \
-B $(pwd)/CarlaUE4/Saved:/home/carla/CarlaUE4/Saved,$(pwd)/Engine/Saved:/home/carla/Engine/Saved \
-e carla-0.9.11.sif /home/carla/CarlaUE4.sh -opengl
I don't have nvidia-container-cli
, but the IT staff I wrote to said the --nv
setting should work as is.
I think the challenge here is the lack of an error message / transparency from the CARLA software.
Exactly the same problem as OP here. @fireofearth Any update so far?
Note the following warning when running nvidia-smi
WARNING: infoROM is corrupted at gpu 0000:04:00.0
That doesn't bode well for the firmware on the GPU based on this issue:
Corrupted means the inforom did not pass some sort of sanity check (e.g. checksum). Therefore the GPU driver won't use or trust its contents.
Regarding --nv
, that 'should' be enough to get Vulkan, CUDA, and OpenGL working, assuming the singularity configuration file is 'mounting' all the dependencies from the host's driver properly since nvidia moves dependencies around between the cuda framework and driver depending on the version. A word of warning about the following:
the IT staff I wrote to said the --nv setting should work as is
In my experience, when HPC IT Admins say that, what they mean is compute (i.e. CUDA) will work, OpenGL rendering can be a different beast and isn't often considered on 'big iron' systems, at least in my experience.
We've had CARLA running on Cedar at some point, but unfortunately after some SLURM update it stopped. After a lot of digging, we have come to suspect that the issue is with how Singularity binds different GPUs, which seems to have been reworked in a recent PR. It looks like this is going to be released in Singularity 3.9, so maybe when that comes out we'll be able to run CARLA on Cedar again. In the meantime, we were able to run CARLA in Singularity on HPC clusters that use PBS instead of SLURM.
Is the problem solved? I am also trying to set up CARLA on compute Canada cluster (Cedar), but facing the same issue.
@qhaas This was really helpful. I am setting this up for a user on our HPC and I can run Carla headless as described. But I have a rather naive question (as I haven't used this software and would like to test). Once Carla is running on the headless node, how does one go about to connecting a graphical interface to Port 2000? I have tried port forwarding via ssh and connecting via a browser - that definitely does not seem to work. For example, after getting Carla running, open a separate local shell and use:
ssh -N -L2000:carlanode:2000 username@hpc-login.some.edu
Then try to connect in the browser to http://localhost:2000 nmap from the login node shows that port 2000 is open. I've used this technique for Jupyter notebooks successfully.
I assume there is some other way?
/home/carla/CarlaUE4.sh Hi, It seems that this command assumes Carla is already installed in the local machine?
As I understand it is possible to run latest CARLA in Singularity with using -carla-server && - RenderOffScreen without any problems ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Since OpenGL is no more in recent versions of UnrealEngine, and thus CARLA, time for an update...
We build just like always: singularity build carla-0914.sif docker://carlasim/carla:0.9.14
To run headless / CLI only: singularity run --nv -e carla-0914.sif /home/carla/CarlaUE4.sh -RenderOffScreen -nosound
I tried singularity pull docker://carlasim/carla:0.9.15
and
singularity run --nv -e carla_0.9.15.sif /home/carla/CarlaUE4.sh -RenderOffScreen -nosound
but the script exits with code 1:
INFO: Setting 'NVIDIA_VISIBLE_DEVICES=all' to emulate legacy GPU binding.
INFO: Setting --writable-tmpfs (required by nvidia-container-cli)
chmod: changing permissions of '/home/carla/CarlaUE4/Binaries/Linux/CarlaUE4-Linux-Shipping': Operation not permitted
4.26.2-0+++UE4+Release-4.26 522 0
Disabling core dumps.
sh: 1: xdg-user-dir: not found
No luck with Carla 0.9.14 either.
sh: 1: xdg-user-dir: not found
is a harmless red herring. If singularity doesn't return and appears to hang, then it is possible it is running in headless / cli mode and listening for connections from the carla client.
The singularity run --nv -e carla_0.9.15.sif /home/carla/CarlaUE4.sh -RenderOffScreen -nosound
will:
--nv
), a different argument is needed to enable Intel / AMD GPUs-e
)-RenderOffScreen -nosound
)If you wish to run CARLA with a GUI on an nVidia GPU, the command is (note the lack of -e
): singularity exec --nv carla_0.9.15.sif /home/carla/CarlaUE4.sh
This assumes singularity / apptainer is configured properly with your GPU, which can be verified (for nvidia) with singularity run --nv -e carla_0.9.15.sif nvidia-smi -L
The problem is that singularity immediately returns. The output of singularity run --nv -e carla_0.9.15.sif nvidia-smi -L
is:
GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-978777c2-22b9-c0a6-65c2-6164938beca6)
GPU 1: NVIDIA A100-PCIE-40GB (UUID: GPU-1618e4dd-e246-7a0f-2873-43a694f27743)
MIG 2g.10gb Device 0: (UUID: MIG-444a2664-9fae-59e9-a85f-4fe324bd65a1)
GPU 2: NVIDIA A100-PCIE-40GB (UUID: GPU-9b367d38-2748-b606-ca02-04e8a765bdae)
I also tried with the --net --network=none
to make sure the loopback network is available inside the container, but still no luck.
@abol-karimi did you find a solution? I seem to have the exact same problem
My problem was due to Apptainer version and Vulkan: https://github.com/carla-simulator/carla/issues/6374#issuecomment-2159170451
I'm trying to run CARLA Simulator in a HPC (in Compute Canada to be exact. Specifications if needed: https://docs.computecanada.ca/wiki/Cedar) (headless server). Docker is not available, so I'm using Singularity.
CARLA exits immediately with very little indication of where the error came from. Has anyone successfully launched CARLA Simulator as a Singularity image at all? If so, how did you do it?
I've seen lot's of Github issues about running CARLA in Docker, but information seems sparse regarding using Singularity.
Program versions
Singularity version: Singularity 3.7 CARLA image: carlasim/carla:0.9.11 (from https://hub.docker.com/r/carlasim/carla/tags)
nvidia-smi
output:What I've tried
First I build the SIF file:
singularity build carla-0.9.11.sif docker://carlasim/carla:0.9.11
I've tried running CARLA from an interactive shell:
I've also tried
singularity> ./CarlaUE4.sh -opengl
. Both give:And then CARLA exits with code 1. I've also tried (with/out
-opengl
flag):singularity exec --nv carla-0.9.11.sif /bin/bash /home/carla/CarlaUE4.sh
which gives the same output and exit code 1. I've also played around with the suggestions in https://github.com/carla-simulator/carla/issues/1290 which unfortunately didn't help much. I have yet to test running on different versions of CARLA. Will post updates if I find anything.