NVIDIA 'runtime' images don't have necessary CUDA components

hkelley commented 1 year ago

Describe the bug When using the runtime flavor of NVIDIA images, https://github.com/f0cker/crackq/blob/675a5b62191cd999b3f3a5304138ef021800e156/docker/nvidia/ubuntu/Dockerfile#L1C2-L1C2

hashcat does not recognize NVIDIA T4 GPUs, even though nvidia-smi does.

To Reproduce Steps to reproduce the behavior:

Build containers.
Open shell in crackq
Run nvidia-smi
```
sudo docker exec -it crackq /bin/bash
```

crackq@crackq:/opt/crackq/build$ nvidia-smi Tue Aug 8 13:12:18 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 66C P0 30W / 70W | 2MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 | | N/A 67C P0 28W / 70W | 2MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 Off | 00000000:00:06.0 Off | 0 | | N/A 67C P0 30W / 70W | 2MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 Off | 00000000:00:07.0 Off | 0 | | N/A 67C P0 30W / 70W | 2MiB / 15360MiB | 8% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

4. Run `haschat -I` or a benchmark.

clGetPlatformIDs(): CL_PLATFORM_NOT_FOUND_KHR

ATTENTION! No OpenCL-compatible or CUDA-compatible platform found.

You are probably missing the OpenCL or CUDA runtime installation.



**Expected behavior**
Hashcat recognizes CUDA-compatible GPUs.

**Additional context**
This seems to work if you use the `devel` flavor of image, e.g. 
```FROM nvidia/cuda:12.2.0-devel-ubuntu20.04```

Per NVIDIA:

> 

> Three flavors of images are provided:
> 
>     base: Includes the CUDA runtime (cudart)
>     runtime: Builds on the base and includes the [CUDA math libraries](https://developer.nvidia.com/gpu-accelerated-libraries), and [NCCL](https://developer.nvidia.com/nccl). A runtime image that also includes [cuDNN](https://developer.nvidia.com/cudnn) is available.
>     devel: Builds on the runtime and includes headers, development tools for building CUDA images. These images are particularly useful for multi-stage builds.

hkelley commented 1 year ago

Even once the devel images are used, this (tangential) issue is still present:

https://github.com/NVIDIA/nvidia-docker/issues/1730

Need to resolve this as well to achieve maximum stability.

Some workarounds are presented in that link. The symlink creation may be the best one for CrackQ.

When the container loses access to the GPU, you will see the following error message from the console output:
Failed to initialize NVML: Unknown Error
The container needs to be deleted once the issue occurs.

When it is restarted (manually or automatically depending on the use of a container orchestration platform), it will regain access to the GPU.

The issue originates from the fact that recent versions of runc require that symlinks be present under /dev/char to any device nodes being injected into a container. Unfortunately, these symlinks are not present for NVIDIA devices, and the NVIDIA GPU driver does not (currently) provide a means for them to be created automatically.
A fix will be present in the next patch release of all supported NVIDIA GPU drivers

f0cker commented 10 months ago

Thanks for reporting this. Can you try the v0.1.2 branch and let me know if you're still seeing this issue?

f0cker / crackq

NVIDIA 'runtime' images don't have necessary CUDA components #40