aws / aws-ofi-nccl

This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
Apache License 2.0
129 stars 51 forks source link

Add platform hook to sort rails and EFA implementation #458

Closed rauteric closed 1 week ago

rauteric commented 1 week ago

Some providers (including EFA) rely on having a consistent ordering of rail indices for best performance.

On EFA, ensure plugin's multi-rail protocol consistently sorts rails in order of VF index for best performance.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.