Open RaymondLi0 opened 1 year ago
This PR is based on https://github.com/NVIDIA/Megatron-LM/pull/268 In addition:
TODO: getting around 30%reduced throughput with UL2.
This PR is based on https://github.com/NVIDIA/Megatron-LM/pull/268 In addition:
TODO: getting around 30%reduced throughput with UL2.