Closed Bortus-AI closed 7 months ago
The 545.23.08 driver is not hosted on the http://download.nvidia.com/XFree86/Linux-x86_64/ site so it will not work if thats your host driver version. You've got 545.23.06, 545.29.02, 545.29.06 available on thier site.
If you want to use 545.23.08 you'll need to download the driver's .run installer yourself, place it in your home folder's downloads directory, /home/default/Downloads/, and rename it to NVIDIA_545.23.08.run
The container then will find it on startup, as long as it still matches the host driver version, and install without attempting to download the installer.
NVIDIA_545.23.08.run
I got this error in the installer log /home/default/Downloads/NVIDIA_545.23.08.run: 226: cannot create /dev/tty: No such device or address /home/default/Downloads/NVIDIA_545.23.08.run: 226: cannot create /dev/tty: No such device or address Signal caught, cleaning up
Maybe need to pass a --silent flag?
Well nvm I see --silent in etc/cont-init.d/60-configure_gpu_driver.sh
Here is the next error
2023-11-28T06:46:03.824805489Z - Installing NVIDIA driver v545.23.08 to match what is running on the host 2023-11-28T06:46:17.514605002Z Extraction failed. 2023-11-28T06:46:17.514622996Z Ensure there is enough space in /tmp and that the installation package is not corrupt 2023-11-28T06:46:18.207153526Z Build: [2023-11-25 02:36:37] [master] [6cc9f56155f3c7f9fc6bc9c22ef2cbf555029c00] [debian]
Let me redownload and checksum again
okay redownloaded and md5sum matches now
Its still in a boot loop and stuck on installing. I can attach for just a second and see that there are processes for the NVIDIA_545.23.08.run running
Since it keeps nuking the log I did tail on it and I see Unknown option: --accept-license
This could be due to this driver version. Really don't want to change the host driver version. It was a PITA to get installed and working with my dockers
I'll just have to wait for the proper official NVIDIA_545.23.08.run to be available on that site instead of this cuda version
I figured out how to extract the official NVIDIA_545.23.08.run from the cuda run file so I tried it again and its working :) Thanks for the help
Extracting the cuda run file is not hard, but it took some time to figure out. Here is quick guide.
chmod +x <your file here>
).run
file: ./cuda_installer_file.run --extract=/path/to/extracted_files
/path/to/extracted_files
and find the right .run
file there.Hopefully this helps someone!
Extracting the cuda run file is not hard, but it took some time to figure out. Here is quick guide.
- Download your driver here: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local
- Make the downloaded executable (
chmod +x <your file here>
)- Execute this command to exstract the
.run
file:./cuda_installer_file.run --extract=/path/to/extracted_files
- Go into the newly created directory in
/path/to/extracted_files
and find the right.run
file there.Hopefully this helps someone!
Yep that's what I did. Worked great
Describe the Bug
Started up the docker and the logs show its downloading the nvidia drver
2023-11-28T04:48:51.871722668Z [ /etc/cont-init.d/60-configure_gpu_driver.sh: executing... ] 2023-11-28T04:48:52.037322341Z No Intel device found 2023-11-28T04:48:52.037337690Z No AMD device found 2023-11-28T04:48:52.037340466Z Found NVIDIA device 'NVIDIA RTX A4000' 2023-11-28T04:48:52.039353067Z - Downloading driver v545.23.08
But it just sits there for hours and if I try to attach to the docker it says
Container 7d1160fdf7aa8ffa6c61d7aa7e3f83e9847d3b1719f7152e2d82e75280696071 is restarting, wait until the container is running
Steps to Reproduce
No response
Expected Behavior
No response
Screenshots
No response
Relevant Settings
No response
Version
Build: [2023-11-25 02:36:37] [master] [6cc9f56155f3c7f9fc6bc9c22ef2cbf555029c00] [debian]
Platform
Debian GNU/Linux - 12 (bookworm) 6.1.0-13-amd64 unknown unknown GNU/Linux | NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 | Docker version 20.10.24+dfsg1, build 297e128 Docker Compose version v2.21.0
Relevant log output
No response