bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

How to preprocess data for t5 model? #371

Open xiu-ze opened 1 year ago

xiu-ze commented 1 year ago

I want to run pretrain_t5.sh, but I don't know how should I preprocess data and the format of the data. Is the preprocess for t5 the same as gpt2 ?