Open tetratorus opened 1 year ago
From various sources:
From what I understand, 512 was chosen as a optimal value that balance training output, cost and speed. Obviously you can change that o 1024 or 2048 if you have larger gpu and want a better training output.
@diegomontoya Thanks your perfect explanation.
For alpaca instruction tuning, we choose 512 as the max sequence length. For dialog instruction tuning, we choose 2048 as the max sequence length. For image-text alignment in LLaMa-Adapter V2, we choose 96 as the max sequence length. For multimodal instruction tuning, we choose 512 as the max sequence length.
the original LLaMA max sequence length is 2048 but why is it that the finetuning.sh script uses 512 as the max sequence length?
is it for efficiency reasons since the alpaca dataset doesn't exceed 512 tokens?