Initializing SDL: No available video device

Snafuh commented 1 year ago

Hi

First of all, thanks a lot for supporting this project Adam.

I'm having issues getting an example project running. My GPU is not detected correctly and therefore the SDL Initialization fails

[2023.01.22-17.10.45:459][  0]LogHAL: Warning: Splash screen image not found.
[2023.01.22-17.10.45:459][  0]LogInit: Initializing SDL.
[2023.01.22-17.10.45:463][  0]LogInit: Warning: Could not initialize SDL: No available video device
[2023.01.22-17.10.45:463][  0]LogInit: Error: FLinuxApplication::CreateLinuxApplication() : InitSDL() failed, cannot create application instance.

I'm running under a Windows host (my local dev setup). I'm able to correctly run the NVIDIA benchmark demo containers, so I think my host docker and driver setup is correct.

nvidia-smi displays my GPU correctly.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 527.92.01    Driver Version: 528.02       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   37C    P8    N/A /  N/A |    223MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        21      G   /Xwayland                       N/A      |
|    0   N/A  N/A        21      G   /Xwayland                       N/A      |
+-----------------------------------------------------------------------------+

Some more cuda diagnostics in the same docker container

>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.current_device()
0
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7fefde78e4e0>
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce GTX 1050 Ti'

It seems like torch can find the GPU correctly as well.

My docker file:

FROM ghcr.io/epicgames/unreal-engine:dev-slim-5.1 AS builder
# Copy the source code for our dummy Unreal project
COPY --chown=ue4:ue4 ./DummyProject /tmp/project

# Build and package our Unreal project
WORKDIR /tmp/project
#RUN ue5 package

RUN /home/ue4/UnrealEngine/Engine/Build/BatchFiles/RunUAT.sh \
BuildCookRun \
-utf8output \
-platform=Linux \
-clientconfig=development \
-serverconfig=development \
-project=/tmp/project/DummyProject.uproject \
-noP4 -nodebuginfo -allmaps \
-cook -build -stage -prereqs -pak -archive \
-archivedirectory=/tmp/project/Packaged 

# Copy the packaged files into a runtime container image that doesn't include any Unreal Engine components
FROM ghcr.io/epicgames/unreal-engine:runtime-pixel-streaming
COPY --from=builder --chown=ue4:ue4 /tmp/project/Packaged/Linux /home/ue4/project

# Enable the NVIDIA driver capabilities required by the NVENC API
ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES},video

ENV XDG_RUNTIME_DIR /tmp/xdg/

# Create a symbolic link to the path where libnvidia-encode.so.1 will be mounted, since UE4 seems to ignore LD_LIBRARY_PATH
RUN ln -s /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 /home/ue4/project/DummyProject/Binaries/Linux/libnvidia-encode.so.1

# Set the packaged project as the container's entrypoint
ENTRYPOINT ["/home/ue4/project/DummyProject.sh", "-RenderOffscreen"]```

Any ideas what kind of further diagnostics I could run despite nvidia-smi to see if the GPU is correctly exposed in the container?

Cheers, Kim

adamrehn commented 1 year ago

Hi @Snafuh, so the issue is not that the GPU isn't exposed correctly to the container, but rather that WSL2 doesn't fully support the Vulkan graphics API yet. The way that WSL2 GPU access works is by making use of WDDM GPU Paravirtualization (GPU-PV), which exposes the DirectX kernel interface and accompanying DirectX 12 graphics + compute APIs to the Linux virtual machine. Other APIs then need to be supported by providing translation layers that convert API calls to their DirectX counterparts. At the moment, translation layers are provided for OpenCL and OpenGL and for NVIDIA CUDA (the latter of which communicates with the DirectX kernel interface directly rather than attempting to translate CUDA API calls to DirectX 12 compute API calls).

Microsoft is working on a Vulkan-to-DirectX translation layer called dzn ("Dozen"), which recently reached the milestone of supporting Vulkan 1.0. It looks like Microsoft is now working on Vulkan 1.1 support, so hopefully they'll continue until Dozen supports Vulkan 1.3, which is what I believe is required in order to run Unreal Engine 5.1. (Looking back through the release notes, it looks like Vulkan 1.2 support should be sufficient to run Unreal Engine 4.26, 4.27 and 5.0. For engine releases older than that, you can just use OpenGL instead.)

In the meantime, there are two options available:

If you're targeting Unreal Engine 4.25 or older then you can use the OpenGL graphics API by specifying the -opengl4 command-line flag. Unfortunately, OpenGL support was removed in Unreal Engine 4.26, so this option will not work for newer versions of the Unreal Engine.
If you're targeting Unreal Engine 4.26 or newer then you will need to run your GPU accelerated Linux containers on a Linux host system. This could be a bare metal installation (e.g. on another machine or dual-booting with Windows on your current machine), a GPU-enabled virtual machine instance running in the cloud, or a Linux virtual machine that has access to a physical GPU by means of PCI passthrough.

(Note that PCI passthrough will require either a Linux host system with IOMMU support and a hypervisor such as KVM, or a Windows Server host system with both IOMMU and SR-IOV support and the Hyper-V hypervisor. Client versions of Windows such as Windows 10 or Windows 11 do not support Hyper-V Discrete Device Assignment (DDA), and most consumer GPUs do not support SR-IOV, so a Linux host system is typically the most practical option for most developers, at which point you can just run the container directly on the host without needing a VM.)

I'm going to pin this issue for the time being, since this is a fairly common question that I hear from members of the community and improved visibility of the answer may help save some folks from wasting time trying to get UE working in WSL. I'll post updates here as Microsoft continues to improve Dozen, and I'll close the issue once the latest releases of the Unreal Engine are able to run under WSL2.

adamrehn commented 1 year ago

Update: Dozen now supports Vulkan 1.1. That was pretty quick, we'll see how long Vulkan 1.2 takes!

adamrehn commented 1 year ago

And now Vulkan 1.2 support has been implemented as well. Looks like Microsoft is pushing hard on this!

Snafuh commented 1 year ago

Wow, thanks a lot for that very detailed answer. I was doing further research after posting this issue and reached the conclusion about WSL2 not supporting this as well. Good to see Microsoft further pushing WSL2 capabilities.

Running the container on a Linux Host seems like the most stable and easy way forward. Building on Windows worked perfectly fine though. So containers can still offer great value for us Windows developers.

vinkovsky commented 7 months ago

@Snafuh Hello! Did you manage to launch UE via wsl?

adamrehn / ue4-runtime

Initializing SDL: No available video device #13