Closed ksnell88 closed 2 years ago
T-rex updated something starting in 0.24.5 that introduced some new issues with nvidia/CUDA. Can you try running nvidia-smi
from within the container?
Docker Compose can't even keep the container up to try. After doing docker-compose up -d
I had the below output when trying to get inside the container. Seems to be essentially stuck in a boot loop?
ks@docker:~/docker/trex$ docker exec -it trex bash
Error response from daemon: Container 347af0c238f77c46ff9c64e9f2b14e33e72463d7116484666004074e57b86e83 is restarting, wait until the container is running
Hmm. Do you have the --runtime=nvidia
flag set?
If you're building from source, you could try changing the base image to the CUDA version that matches yours:
FROM 11.2.1-base-ubuntu18.04
It seems like this might be a driver/CUDA version issue along with the adding of libnvidia-ml-dev
to the Dockerfile from Issue 31 >> Can't load NVML library · Issue #31 · PTRFRLL/nv-docker-trex.
wget libnvidia-ml-dev \
and replaced with wget \
in the Dockerfile it would build and run successfully, but of course then I get the output error which appears to be from issue 31.latest
image, but if I built from source with FROM nvidia/cuda:11.4.1-base-ubuntu20.04
it seems fixed.Can you try this tag and see if it fixes it?
ghcr.io/ptrfrll/nv-docker-trex:test
That one worked. What was the change?
I updated the base image to CUDA 11.4 like you did in yours. Glad it's working now. I pushed the changes to the latest
tag
Is there a configuration change required after v3.5 and beyond? I received the below error with the latest version on Dockerhub. I built 3.3 from source here on Github and had no errors so it seems to be something to do with the new image it seems.
OS: Ubuntu Server 20.04 LTS Nvidia Driver: 460.91.03 CUDA Version: 11.2
Happy to provide any other information as needed.