Open akjindal53244 opened 1 month ago
The paper has some evaluations around that: https://arxiv.org/pdf/2406.06623 tldr: conservatively ~30% faster, but you can amp that up as much as you want by increasing batch size with improved vram efficiency + lowering the % of targeted layers.
To clarify, the speed improvements are twofold. First, in our paper, we evaluated performance using the exact same hyperparameters. In practice, however, the increased VRAM available allows for larger batch sizes, which also contributes to faster speeds.
For instance, on an 8xH100 node, you can fully fine-tune Llama-3-8b with a batch size of 2 and a sequence length of 8192. With Spectrum-50, however, you can double the batch size to 4. This further enhances the speed gains you already achieve by simply using Spectrum.
Ty for your kind words, btw.
Hi Spectrum Team,
Congratulations on the release of wonderful library! :) What are the speedup gains using spectrum compared to full-finetuning? If there is any benchmarks or details around observed gains, that would be helpful.