ROCm / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
5 stars 3 forks source link

Align FP16 NCCL/RCCL to 4-byte boundary NEW #38

Closed amathews-amd closed 3 years ago

amathews-amd commented 3 years ago

Fixes https://ontrack.amd.com/browse/MSRCHA-137

Adds 4-byte alignment on NCCL/RCCL workloads to speed up workloads. The start location of all data partitions (across worldsize) is aligned to 4-byte boundary.

With upstream changes: https://github.com/microsoft/DeepSpeed/pull/1328

jithunnair-amd commented 3 years ago

Closing this PR as changes cherry-picked from upstream: https://github.com/ROCmSoftwarePlatform/DeepSpeed/pull/42