aws / aws-ofi-nccl

This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
Apache License 2.0
129 stars 51 forks source link

Add ROCm as alternative to CUDA for plugin use #461

Open ryanhankins opened 1 week ago

ryanhankins commented 1 week ago

Description of changes:

See commit messages for more detail. Add a --with-rocm flag to configure.ac to switch between CUDA and ROCm GPU calls, to support AMD GPUs. Add code to fiiles to abstract CUDA calls, and, upon the use of the --with-rocm option, to call the ROCm alternatives.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

liralon commented 6 days ago

@ryanhankins Can you please add to commit message some information on which platforms you have tested this functionality to work properly?