Closed SurajAralihalli closed 5 months ago
@jayadeep-jayaraman @viadea FYI
have you tested this on ubuntu or debian?
on 2.0-debian10 I receive this error when I pass cuda-version="12.4":
Passing it as 12.4.1 or leaving it unset works, though, so this is not a blocker. If symlinks could be put in place on the nvidia filesystem the installer would be more robust.
cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.0
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
Thu Jun 27 18:30:03 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 |
| N/A 65C P0 34W / 72W | 0MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
2.1-debian11 also works:
cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.1
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
Thu Jun 27 18:41:26 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 |
| N/A 64C P0 34W / 72W | 0MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
2.20-debian12 also LGTM.
cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.2
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
Thu Jun 27 18:49:48 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 |
| N/A 65C P0 31W / 72W | 0MiB / 23034MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Thank you! We generally recommend the user not to set cuda-version or driver-version in the metadata in our docs. Different cuda versions may have different steps of installation and supporting all the versions in the init script may not be feasible. However, I've left this feature available for advanced users who can test and run.
This PR updates
signed-off-by: Suraj Aralihalli suraj.ara16@gmail.com