The building of the backends currently only supports CPU support or CUDA support. As more machines get AMD accelerators, we should allow users to build with ROCM support. This has been successfully done for a previous workshop on Frontier as a one-off via a spack installation.
Justification
Users with AMD GPUs will be able to install SmartSim for themselves without going through a spack install.
Implementation Strategy
Understand how Tensorflow and ONNX backends and their accordant ML packages add support for ROCM
Expand ml_lib_builder if ROCM backends are unavailable
Modify smart build to change the device option to gpu-stack
Modify builder.py and builderenv.py to retrieve backend libraries for ROCM
Description
The building of the backends currently only supports CPU support or CUDA support. As more machines get AMD accelerators, we should allow users to build with ROCM support. This has been successfully done for a previous workshop on Frontier as a one-off via a spack installation.
Justification
Users with AMD GPUs will be able to install SmartSim for themselves without going through a spack install.
Implementation Strategy
ml_lib_builder
if ROCM backends are unavailablesmart build
to change thedevice
option togpu-stack
builder.py
andbuilderenv.py
to retrieve backend libraries for ROCM