Closed hkelley closed 10 months ago
Even once the devel images are used, this (tangential) issue is still present:
https://github.com/NVIDIA/nvidia-docker/issues/1730
Need to resolve this as well to achieve maximum stability.
Some workarounds are presented in that link. The symlink creation may be the best one for CrackQ.
When the container loses access to the GPU, you will see the following error message from the console output:
Failed to initialize NVML: Unknown Error
The container needs to be deleted once the issue occurs.
When it is restarted (manually or automatically depending on the use of a container orchestration platform), it will regain access to the GPU.
The issue originates from the fact that recent versions of runc require that symlinks be present under /dev/char to any device nodes being injected into a container. Unfortunately, these symlinks are not present for NVIDIA devices, and the NVIDIA GPU driver does not (currently) provide a means for them to be created automatically.
A fix will be present in the next patch release of all supported NVIDIA GPU drivers
Thanks for reporting this. Can you try the v0.1.2 branch and let me know if you're still seeing this issue?
Describe the bug When using the
runtime
flavor of NVIDIA images, https://github.com/f0cker/crackq/blob/675a5b62191cd999b3f3a5304138ef021800e156/docker/nvidia/ubuntu/Dockerfile#L1C2-L1C2hashcat does not recognize NVIDIA T4 GPUs, even though
nvidia-smi
does.To Reproduce Steps to reproduce the behavior:
nvidia-smi
crackq@crackq:/opt/crackq/build$ nvidia-smi Tue Aug 8 13:12:18 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 66C P0 30W / 70W | 2MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 | | N/A 67C P0 28W / 70W | 2MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 Off | 00000000:00:06.0 Off | 0 | | N/A 67C P0 30W / 70W | 2MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 Off | 00000000:00:07.0 Off | 0 | | N/A 67C P0 30W / 70W | 2MiB / 15360MiB | 8% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
clGetPlatformIDs(): CL_PLATFORM_NOT_FOUND_KHR
ATTENTION! No OpenCL-compatible or CUDA-compatible platform found.
You are probably missing the OpenCL or CUDA runtime installation.