adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
84 stars 100 forks source link

Ansible request for NVidia CUDA toolkit #3581

Open AswathySK opened 4 weeks ago

AswathySK commented 4 weeks ago

The silent installation for NVIDIA toolkit is not successful. The NVIDIA GPU Computing Toolkit folder is not getting created causing build compiles to throw error saying CUDA_HOME not found.

The issue can be resolved by changing compiler_9.1 in the playbook to nvcc_9.1 https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-microsoft-windows/index.html

C:\temp\cuda_9.1.85_win10_network.exe -s nvcc_9.1 nvml_dev_9.1

9.1 version was released back in 2017, Is there any reason why we cant change it to a newer version? 12.0.0 offers support to most windows versions -10, 11, server 2016,2019 and 2022.

sxa commented 4 weeks ago

Any feelings on this @AdamBrousseau @pshipton ?

pshipton commented 4 weeks ago

@keithc-ca pls take a look.

keithc-ca commented 4 weeks ago

CUDA is only for OpenJ9, so I support updating to a newer version (e.g. 12.0).

There are other inconsistencies that should be addressed to successfully install and use whatever version we choose.

  1. the Windows ansible role looks for 9.0, before trying to install 9.1
  2. build-farm/platform-specific-configurations/windows.sh in temurin-build looks for version 9.0

It may make sense to choose the same version for Unix (which currently uses 9.0).

AswathySK commented 4 weeks ago

@keithc-ca , I was planning to make a change to the build-farm/platform-specific-configurations/windows.sh as well. Even by following the current playbook CUDA_PATH env variable set is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1.

I will make the path change in checking installation status of NVidia CUDA toolkit after the change is made in build-farm/platform-specific-configurations/windows.sh ?

keithc-ca commented 4 weeks ago

Those changes would be in separate repositories, so (at least) two pull requests. Committers will coordinate the timing of merging them (assuming they approve).

AswathySK commented 3 weeks ago

@steelhead31 @karianna , What are your thoughts on bumping to a newer version of Cuda toolkit?

steelhead31 commented 3 weeks ago

@steelhead31 @karianna , What are your thoughts on bumping to a newer version of Cuda toolkit?

Assuming the openj9 folks approve, as above, and the resultant JDK is run through the relevant test suites, prior to the changes being merged, I don't have any objections.