Why use 512 as the max sequence length for fine tuning alpaca?

OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

GNU General Public License v3.0

5.76k stars 375 forks source link

Why use 512 as the max sequence length for fine tuning alpaca? #44

Open tetratorus opened 1 year ago

tetratorus commented 1 year ago

the original LLaMA max sequence length is 2048 but why is it that the finetuning.sh script uses 512 as the max sequence length?

is it for efficiency reasons since the alpaca dataset doesn't exceed 512 tokens?

Qubitium commented 1 year ago

From various sources:

512 covers 95% of the alpaca data
reduce vram training cost
allows higher batching value due to 2, reduced vram training cost

From what I understand, 512 was chosen as a optimal value that balance training output, cost and speed. Obviously you can change that o 1024 or 2048 if you have larger gpu and want a better training output.

gaopengpjlab commented 1 year ago

@diegomontoya Thanks your perfect explanation.

For alpaca instruction tuning, we choose 512 as the max sequence length. For dialog instruction tuning, we choose 2048 as the max sequence length. For image-text alignment in LLaMa-Adapter V2, we choose 96 as the max sequence length. For multimodal instruction tuning, we choose 512 as the max sequence length.