Description

The building of the backends currently only supports CPU support or CUDA support. As more machines get AMD accelerators, we should allow users to build with ROCM support. This has been successfully done for a previous workshop on Frontier as a one-off via a spack installation.

Justification

Users with AMD GPUs will be able to install SmartSim for themselves without going through a spack install.

Implementation Strategy

Understand how Tensorflow and ONNX backends and their accordant ML packages add support for ROCM
Expand ml_lib_builder if ROCM backends are unavailable
Modify smart build to change the device option to gpu-stack
Modify builder.py and builderenv.py to retrieve backend libraries for ROCM
Test on Frontier

CrayLabs / SmartSim

Include support for building with ROCM #617

Description

Justification

Implementation Strategy