bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.32k stars 213 forks source link

Merge MLM too fast 2 #294

Closed thomasw21 closed 2 years ago

thomasw21 commented 2 years ago

Some things I noticed when coding shared-t5.