GoogleCloudDataproc / initialization-actions

Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
https://cloud.google.com/dataproc/init-actions
Apache License 2.0
588 stars 512 forks source link

[spark-rapids] Update spark rapids version to 24.04.0 #1176

Closed SurajAralihalli closed 6 months ago

SurajAralihalli commented 7 months ago

This PR updates

  1. spark-rapids.sh init script with the latest 24.02.0 (to-date) rapids-4-spark version.
  2. Changes default driver version to 550.54.15 and cuda 12.4.1
  3. Uses apt-get --allow-releaseinfo-change update to unblock downstream applications until dataproc platform fixes apt-get update issues in Debian 10 and Ubuntu linux distros.

signed-off-by: Suraj Aralihalli suraj.ara16@gmail.com

SurajAralihalli commented 7 months ago

FYI @viadea @jayadeep-jayaraman

jayadeep-jayaraman commented 6 months ago

/gcbrun

jayadeep-jayaraman commented 6 months ago

/gcbrun

jayadeep-jayaraman commented 6 months ago
+ echo 'Error: Secure Boot is enabled. Please disable Secure Boot while creating the cluster.'
Error: Secure Boot is enabled. Please disable Secure Boot while creating the cluster.
+ exit 1

We need to disable secure-boot for 2.2 ubuntu

viadea commented 6 months ago
+ echo 'Error: Secure Boot is enabled. Please disable Secure Boot while creating the cluster.'
Error: Secure Boot is enabled. Please disable Secure Boot while creating the cluster.
+ exit 1

We need to disable secure-boot for 2.2 ubuntu

@jayadeep-jayaraman Is there any action item needed in this PR? I think this error just tells us we need to disable secure boot for 2.1 and 2.2 dataproc in your CICD pipeline.

SurajAralihalli commented 6 months ago

We need to disable secure-boot for 2.2 ubuntu

Yes we need to disable secure boot for Dataproc 2.1 and 2.2 to install the drivers. A note to disable secure boot is added in the Create a Dataproc cluster accelerated by gpus docs for users.

jayadeep-jayaraman commented 6 months ago

The test is failing for 2.2 ubuntu and we should bypass the test for this image version in the PR

viadea commented 6 months ago

The test is failing for 2.2 ubuntu and we should bypass the test for this image version in the PR

@jayadeep-jayaraman i do not think we can run the CICD test to confirm. Do you want to fix this in this PR or you want us to fix this?

SurajAralihalli commented 6 months ago

The test is failing for 2.2 ubuntu and we should bypass the test for this image version in the PR

Secure boot needs to be disabled for ubuntu 22. Would you recommend bypass this check however I think this check will be useful to the users to identify the issue when secure boot is enabled.

SurajAralihalli commented 6 months ago

@jayadeep-jayaraman @viadea @sameerz I've updated the PR to run the skip the tests on 2.0 and 2.1 (due to secure boot issue). I'm confident that the tests would succeed if we find a way to disable the secure boot in CI. This means the tests would run only on Debian 10 (2.0) as we are in the processing of dropping support for Ubuntu 18. To unblock the PR from being merged asap I've included --allow-releaseinfo-change.

viadea commented 6 months ago

@jayadeep-jayaraman @viadea @sameerz I've updated the PR to run the skip the tests on 2.0 and 2.1 (due to secure boot issue). I'm confident that the tests would succeed if we find a way to disable the secure boot in CI. This means the tests would run only on Debian 10 (2.0) as we are in the processing of dropping support for Ubuntu 18. To unblock the PR from being merged asap I've included --allow-releaseinfo-change.

LGTM. @jayadeep-jayaraman shall we merge it after test is passed?

jayadeep-jayaraman commented 6 months ago

/gcbrun