Closed dewijones92 closed 8 months ago
Are you able to run other docker images that require CUDA? Error message seems to say that you cannot access the GPU hardware.
I just noticed that you are trying to run the llama-cpu
variant, please see #9 and #16 for relevant information. I will leave this open as a reminder for me to update the documentation with expanded instructions for CPU inference.
TLDR: Comment out the deploy:
block in the docker-compose.yml
Hi, In my case also getting the same error when I'm trying to run the docker container using the below command
'docker run --gpus all image-id'
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
Basically I have Created VM using the default Amzon AMI which is verified by Amazon
These are AMI details
GPU (Kernel 4.14) AMI name: amzn2-ami-ecs-gpu-hvm-2.0.20231103-x86_64-ebs ECS Agent version: 1.79.0 Docker version: 20.10.25 Containerd version: 1.6.19 NVIDIA driver version: 535.54.03 CUDA version: 12.2.0 Source AMI name: amzn2-ami-minimal-hvm-2.0.20230926.0-x86_64-ebs
I'm using the below commands to erase the old nvidia-driver (535.54.03)and trying to install new nvidia-driver(535.129.03) version with below commands which are given in aws documentation
sudo yum remove nvidia sudo yum remove cuda sudo yum erase nvidia cuda sudo yum update -y sudo amazon-linux-extras install kernel-5.15 sudo yum install gcc make && sudo yum update -y sudo reboot sudo yum install -y gcc kernel-devel-$(uname -r) chmod +x NVIDIA-Linux-x86_64.run sudo CC=/usr/bin/gcc10-cc ./NVIDIA-Linux-x86_64.run sudo touch /etc/modprobe.d/nvidia.conf echo "options nvidia NVreg_EnableGpuFirmware=0" | sudo tee --append /etc/modprobe.d/nvidia.conf sudo reboot
After following the Above commands I'm able to upgrade nvidia-driver version to 535.129.03 And kernel also I'm able to upgrade to 5.15, But when I'm Running docker container facing the above mentioned issue.
Any Suggestions?
@shaiksuhel1999 you need to install nvidia-ctk and nvidia-container-runtime if the first package doesnt come with it, your docker daemon.json you need to put in the following
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Closing this issue because the docker-compose.yml
now has a comment indicating that the deploy:
section should be commented out for non-Nvidia inferencing.
no luck for me when trying to use this. Am I missing something? thanks