What's the difference between pre-training and SFT?

allenai / open-instruct

Apache License 2.0

1.08k stars 140 forks source link

What's the difference between pre-training and SFT? #170

Closed WilliamsToTo closed 1 month ago

WilliamsToTo commented 1 month ago

As I understand it, the primary difference between pre-training and supervised fine-tuning (SFT) lies in the dataset used. Pre-training is conducted on a plain text corpus, whereas SFT utilizes a dataset in a specific format, such as TULU. Are there any differences in the training scripts, loss functions, or hyperparameters between pre-training and SFT?

natolambert commented 1 month ago

@WilliamsToTo generally there are a lot of small differences, data, batch size, addition of chat templates for SFT, learning rate schedulers, but you can do both in most repo's. For example, we've reproduced some open-instruct results with the olmo repository.

I'm going to close the issue, as it's not really related to the codebase, but feel free to reopen if you had an issue with the code.