GoogleCloudDataproc / initialization-actions

Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
https://cloud.google.com/dataproc/init-actions
Apache License 2.0
588 stars 512 forks source link

[spark-rapids] Update spark rapids version to 24.06.0 #1187

Closed SurajAralihalli closed 5 months ago

SurajAralihalli commented 5 months ago

This PR updates

signed-off-by: Suraj Aralihalli suraj.ara16@gmail.com

SurajAralihalli commented 5 months ago

@jayadeep-jayaraman @viadea FYI

cjac commented 5 months ago

have you tested this on ubuntu or debian?

cjac commented 5 months ago

on 2.0-debian10 I receive this error when I pass cuda-version="12.4":

Passing it as 12.4.1 or leaving it unset works, though, so this is not a blocker. If symlinks could be put in place on the nvidia filesystem the installer would be more robust.

cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.0
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
Thu Jun 27 18:30:03 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   65C    P0             34W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
cjac commented 5 months ago

2.1-debian11 also works:

cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.1
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
Thu Jun 27 18:41:26 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   64C    P0             34W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
cjac commented 5 months ago

2.20-debian12 also LGTM.

cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.2
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm
Thu Jun 27 18:49:48 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   65C    P0             31W /   72W |       0MiB /  23034MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
SurajAralihalli commented 5 months ago

Thank you! We generally recommend the user not to set cuda-version or driver-version in the metadata in our docs. Different cuda versions may have different steps of installation and supporting all the versions in the init script may not be feasible. However, I've left this feature available for advanced users who can test and run.