bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k stars 213 forks source link

Finetuning BLOOM #337

Open AnaRhisT94 opened 2 years ago

AnaRhisT94 commented 2 years ago

Hi,

What's the process in finetuning BLOOM? Did anyone succeed and willing to share the code?

Thanks!

mayank31398 commented 2 years ago

Hi, I am not sure but in the original megatron code, there was an argument (don't remember the name) that resets the optimizer, dataloader etc which.you could use to do finetuning. Not sure if that is present or works in this repo.

KOVVURISATYANARAYANAREDDY commented 1 year ago

Hey @mayank31398, Just wondering is the pretrain_gpt.py is used for pretraining BLOOM models? if yes then Architecture for gpt and Bloom are same? but i see different implementation for gpt and bloom in hugginface transformers.

Also i am trying to finetune StarCoder model using Megatron-DeepSpeed 3D parallelism, can you give some idea how it can be done?

mayank31398 commented 1 year ago

This is the script used for launching 176B: https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/tr11-176B-ml.slurm The architecture is not the same since BLOOM uses alibi and GPT uses absolute embeddings.

mayank31398 commented 1 year ago

For Starcoder, 4D parallelism is used Tensor Parallel, Pipeline Parallel, Sequence Parallel, Data Parallel This is the repo used for starcoder and santacoder training: https://github.com/bigcode-project/Megatron-LM

KOVVURISATYANARAYANAREDDY commented 1 year ago

Thank You.