bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.28k stars 518 forks source link

Which model is the bigcode/starcoder model trained on? #121

Closed HIT-cwh closed 1 year ago

HIT-cwh commented 1 year ago

Thank you for your valuable open-source contribution!

The README provides instructions on how to fine-tune the pre-trained bigcode/starcoder model for downstream tasks. If I want to train a StarCoder model based on the based language model,such as llama2-13B or llama2-7B models, can I simply replace --model_path="bigcode/starcoder" in the command with --model_path="meta-llama/Llama-2-13b-hf"? Will this result in training a model with similar performance to "bigcode/starcoder"?

ArmelRandy commented 1 year ago

Hi. This script was not used to perform the pre-training of starcoder. StarCoder was trained on a vast amount of code, the training data is available here . This code is designed for instruction fine-tuning. This code is specifically designed for starCoder, using another model could require some modifications namely here for example. Using --model_path meta-llama/llama-2-13b-hf would do the instruction fine-tuning of llama on the dataset you passed at --dataset_name. If the dataset is mainly about coding instructions, you are likely to have a better performance (coding assistant) if you use starcoder than if you use llama-2-13b.