bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
371 stars 48 forks source link

Script to train starcoder #49

Closed edward-io closed 1 year ago

edward-io commented 1 year ago

It would be very useful to have a script to reproduce and/or finetune StarCoder, similar to the example script for santacoder.

HuggingFace's trainer doesn't have the ability to do tensor/model parallel, which makes it difficult to fit the model on 1 8-GPU node, so this code would help users fine-tune models.

loubnabnl commented 1 year ago

The training script is available at: https://github.com/bigcode-project/Megatron-LM/blob/multi-query-attention/examples/pretrain_bigcode_model.slurm You can also use HuggingFace Trainer with DeepSpeed, documentation available here or Peft, we provide some examples in StarCoder repo

edward-io commented 1 year ago

Thank you! For those looking for weights, they're located here: https://huggingface.co/bigcode/starcoder-megatron