bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
376 stars 49 forks source link

WIP: UL2 merge #23

Open RaymondLi0 opened 1 year ago

RaymondLi0 commented 1 year ago

This PR is based on https://github.com/NVIDIA/Megatron-LM/pull/268 In addition:

TODO: getting around 30%reduced throughput with UL2.