Ansible request for NVidia CUDA toolkit

adoptium / infrastructure

This repo contains all information about machine maintenance.

Apache License 2.0

86 stars 102 forks source link

Ansible request for NVidia CUDA toolkit #3581

Open AswathySK opened 5 months ago

AswathySK commented 5 months ago

The silent installation for NVIDIA toolkit is not successful. The NVIDIA GPU Computing Toolkit folder is not getting created causing build compiles to throw error saying CUDA_HOME not found.

The issue can be resolved by changing compiler_9.1 in the playbook to nvcc_9.1 https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-microsoft-windows/index.html

C:\temp\cuda_9.1.85_win10_network.exe -s nvcc_9.1 nvml_dev_9.1

9.1 version was released back in 2017, Is there any reason why we cant change it to a newer version? 12.0.0 offers support to most windows versions -10, 11, server 2016,2019 and 2022.

Bug in ansible playbook
Request for new playbook addition

sxa commented 5 months ago

Any feelings on this @AdamBrousseau @pshipton ?

pshipton commented 5 months ago

@keithc-ca pls take a look.

keithc-ca commented 5 months ago

CUDA is only for OpenJ9, so I support updating to a newer version (e.g. 12.0).

There are other inconsistencies that should be addressed to successfully install and use whatever version we choose.

the Windows ansible role looks for 9.0, before trying to install 9.1
build-farm/platform-specific-configurations/windows.sh in temurin-build looks for version 9.0

It may make sense to choose the same version for Unix (which currently uses 9.0).

AswathySK commented 5 months ago

@keithc-ca , I was planning to make a change to the build-farm/platform-specific-configurations/windows.sh as well. Even by following the current playbook CUDA_PATH env variable set is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1.

I will make the path change in checking installation status of NVidia CUDA toolkit after the change is made in build-farm/platform-specific-configurations/windows.sh ?

keithc-ca commented 5 months ago

Those changes would be in separate repositories, so (at least) two pull requests. Committers will coordinate the timing of merging them (assuming they approve).

AswathySK commented 5 months ago

@steelhead31 @karianna , What are your thoughts on bumping to a newer version of Cuda toolkit?

steelhead31 commented 5 months ago

@steelhead31 @karianna , What are your thoughts on bumping to a newer version of Cuda toolkit?

Assuming the openj9 folks approve, as above, and the resultant JDK is run through the relevant test suites, prior to the changes being merged, I don't have any objections.

sxa commented 4 months ago

Assuming the openj9 folks approve, as above, and the resultant JDK is run through the relevant test suites, prior to the changes being merged, I don't have any objections.

sxa commented 4 months ago

Assuming the openj9 folks approve, as above, and the resultant JDK is run through the relevant test suites, prior to the changes being merged, I don't have any objections.

Yeah +1 from me - since Adoptium does not build Temurin I'm happy to follow the suggestions from the upstream project that requires it in terms of versioning in this situation.