Speedup Gains using Spectrum

akjindal53244 commented 1 month ago

Hi Spectrum Team,

Congratulations on the release of wonderful library! :) What are the speedup gains using spectrum compared to full-finetuning? If there is any benchmarks or details around observed gains, that would be helpful.

Crystalcareai commented 1 month ago

The paper has some evaluations around that: https://arxiv.org/pdf/2406.06623 tldr: conservatively ~30% faster, but you can amp that up as much as you want by increasing batch size with improved vram efficiency + lowering the % of targeted layers.

Crystalcareai commented 1 month ago

To clarify, the speed improvements are twofold. First, in our paper, we evaluated performance using the exact same hyperparameters. In practice, however, the increased VRAM available allows for larger batch sizes, which also contributes to faster speeds.

For instance, on an 8xH100 node, you can fully fine-tune Llama-3-8b with a batch size of 2 and a sequence length of 8192. With Spectrum-50, however, you can double the batch size to 4. This further enhances the speed gains you already achieve by simply using Spectrum.

Ty for your kind words, btw.

cognitivecomputations / spectrum

Speedup Gains using Spectrum #4