issues
search
bigscience-workshop
/
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.32k
stars
213
forks
source link
Merge MLM too fast 2
#294
Closed
thomasw21
closed
2 years ago
thomasw21
commented
2 years ago
Some things I noticed when coding shared-t5.
Some things I noticed when coding shared-t5.