Closed edward-io closed 1 year ago
The training script is available at: https://github.com/bigcode-project/Megatron-LM/blob/multi-query-attention/examples/pretrain_bigcode_model.slurm You can also use HuggingFace Trainer with DeepSpeed, documentation available here or Peft, we provide some examples in StarCoder repo
Thank you! For those looking for weights, they're located here: https://huggingface.co/bigcode/starcoder-megatron
It would be very useful to have a script to reproduce and/or finetune StarCoder, similar to the example script for santacoder.
HuggingFace's trainer doesn't have the ability to do tensor/model parallel, which makes it difficult to fit the model on 1 8-GPU node, so this code would help users fine-tune models.