PTRFRLL / nv-docker-trex

Mine crypto using your Unraid server
46 stars 14 forks source link

CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE #41

Closed ksnell88 closed 2 years ago

ksnell88 commented 2 years ago

Is there a configuration change required after v3.5 and beyond? I received the below error with the latest version on Dockerhub. I built 3.3 from source here on Github and had no errors so it seems to be something to do with the new image it seems.

OS: Ubuntu Server 20.04 LTS Nvidia Driver: 460.91.03 CUDA Version: 11.2

Happy to provide any other information as needed.

ERROR: Can't start T-Rex, can't initialize CUDA engine, cuda exception: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE. Is NVIDIA driver installed?

2021-11-18_13-28_trex error

PTRFRLL commented 2 years ago

T-rex updated something starting in 0.24.5 that introduced some new issues with nvidia/CUDA. Can you try running nvidia-smi from within the container?

ksnell88 commented 2 years ago

Docker Compose can't even keep the container up to try. After doing docker-compose up -d I had the below output when trying to get inside the container. Seems to be essentially stuck in a boot loop?

ks@docker:~/docker/trex$ docker exec -it trex bash
Error response from daemon: Container 347af0c238f77c46ff9c64e9f2b14e33e72463d7116484666004074e57b86e83 is restarting, wait until the container is running
PTRFRLL commented 2 years ago

Hmm. Do you have the --runtime=nvidia flag set?

If you're building from source, you could try changing the base image to the CUDA version that matches yours: FROM 11.2.1-base-ubuntu18.04

ksnell88 commented 2 years ago

It seems like this might be a driver/CUDA version issue along with the adding of libnvidia-ml-dev to the Dockerfile from Issue 31 >> Can't load NVML library · Issue #31 · PTRFRLL/nv-docker-trex.

PTRFRLL commented 2 years ago

Can you try this tag and see if it fixes it?

ghcr.io/ptrfrll/nv-docker-trex:test

ksnell88 commented 2 years ago

That one worked. What was the change?

PTRFRLL commented 2 years ago

I updated the base image to CUDA 11.4 like you did in yours. Glad it's working now. I pushed the changes to the latest tag