Description

Many of the most recent versions of ML packages now support CUDA 12. Some (like Tensorflow) require CUDA-12 exclusively. We should allow users to build the backends against CUDA 12 as well to ensure consistency of GPU stack between the install python package versions and the backends themselves. This is complicated however by the fact that not all packages are retaining support. Hence, there may be bifurcation that the users will have to be able to express based on whether they want CUDA 11 or CUDA 12

Justification

Allow users who want to upgrade to using CUDA 12 (especially for new hardware) and/or users who want to maintain legacy support for CUDA-11.

Implementation Strategy

Consider whether we should move the ml extras to smart build exclusively or add separate ml-cuda11 and ml-cuda12 extras
Understand whether we should (or can) expand ml_build_library to build backends against specific CUDA versions, e.g. if libtensorflow is only available as a pre-compiled binary for CUDA 12, is it still possible to build from scratch against CUDA11
Expand nightly testing matrix to include CUDA 11 and CUDA 12

CrayLabs / SmartSim

Allow users to choose between CUDA-11 and CUDA-12 ML Packages #616

Description

Justification

Implementation Strategy