bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Is there any script for pretraining/funting Bloom? #363

Open drxmy opened 1 year ago

drxmy commented 1 year ago

Specially, I am looking a script with Deepspeed PP and ZeRO-DP like this https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/bitfit#deepspeed-pp-and-zero-dp

In my understanding, this script should be able to load bloom with some change, for example add "--position-embedding-type alibi" . I have done some experiment, but it keeps failing.

Really appreciated it if someone could give me some advice!