OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
https://optimalscale.github.io/LMFlow/
Apache License 2.0
8.23k stars 822 forks source link

decide epoch num #94

Closed pckennethma closed 1 year ago

pckennethma commented 1 year ago

Hello,

Probably a trivial question: The fine-tuning does not take a batch_size. It looks like input datasets are somehow grouped. Is there any best practice to decide a proper epoch num for finetuning in LMFlow? (e.g., how to compute the num of epoch for passing the entire dataset)?

research4pan commented 1 year ago

Thanks for your interest in LMFLow! That's a very important question and non-trivial at all. The value of number of epochs varies from dataset to dataset. One way is simply trying: you may increase the number of epochs from small to large, e.g. 0.01, 0.1, 1, 10, and narrow the scope according to the output performance.

If you don't want this group behavior, you may pass the option --disable_group_texts True. Notice that long samples will still be cut into small pieces, so that the transformer model can accept the input.

pckennethma commented 1 year ago

thanks for your prompt reply. may I ask what is the num of epoch used to finetune llama 7B with alpaca for the released ckpt?

research4pan commented 1 year ago

We use 3 epochs for both instruction tuning and medical dataset finetuning.

pckennethma commented 1 year ago

thanks for your response!