SFT Script and Hyperparameters used for DBRX-Instruct

databricks / megablocks

Apache License 2.0

1.11k stars 154 forks source link

SFT Script and Hyperparameters used for DBRX-Instruct #99

Open alpayariyak opened 3 months ago

alpayariyak commented 3 months ago

Hi, I saw you mentioned that you used your fork of Megatron-LM for training - could you please provide scripts and hyperparams used for the SFT of DBRX? It would mean the world for the OSS community!

At openchat, we'd like to fine-tune your model on our data and open source it.

alpayariyak commented 3 months ago

The training would be on H100s.

Another question - how many do you need at minimum?

mvpatel2000 commented 3 months ago

@tgale96 might have scripts for megatron LM integration

We will have integrations with other stacks soon.

For DBRX specifically, you do not necessarily need to use megablocks (though it is more efficient) -- Zero3 + the HF model code is sufficient. For example, foundry would work with this: https://github.com/mosaicml/llm-foundry

CC: @dakinggg

alpayariyak commented 3 months ago

Thank you very much! Do you have insight into the hyperparameters used for DBRX Instruct?

Hyperparameter exploration on this scale is very expensive and out of reach for most of the open source, so this would be incredibly helpful to have.

alpayariyak commented 3 months ago

If there's any chance you could confirm, might these be the hyperparams used for DBRX Instruct? https://github.com/mosaicml/llm-foundry/blob/7a8a1564827cbcbc281a6bdc4a11bc8f584142bd/scripts/train/yamls/finetune/dbrx-full-ft.yaml

alpayariyak commented 3 months ago

One more question (if the above config is what was actually used) - it is noted there that 8x8x80GB are required for the fine-tune. Would you mind sharing approximately the number of tokens or sft examples, the GPUs you used and how long this took?