Merge MLM too fast 2 - Githubissues

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.32k stars 213 forks source link

Closed thomasw21 closed 2 years ago

thomasw21 commented 2 years ago

Some things I noticed when coding shared-t5.