bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.34k stars 216 forks source link

Issue to gather Fixes + New features to send upstream #10

Open stas00 opened 3 years ago

stas00 commented 3 years ago

Please edit the OP to add whatever fixes we applied to the core and which need to be propagated upstream into:

  1. https://github.com/microsoft/Megatron-DeepSpeed
  2. https://github.com/NVIDIA/Megatron-LM

we want to do that to make it easier to sync upstream changes back to this repo.

Changes to send upstream:

Bug fixes:

New functionality:

shoeybi commented 3 years ago

9189c4e and 9e75429 have been fixed. Will take a look at the rest later. Thank you!