axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.87k stars 866 forks source link

Apply unsloth optimizations #908

Open bratao opened 11 months ago

bratao commented 11 months ago

⚠️ Please check that this feature request hasn't been suggested before.

πŸ”– Feature description

The project https://github.com/unslothai/unsloth looks very interesting. He claims great speedups for finetuning. He detail the improvements here:

So in GPUs the goal is to saturate the GPU with matrix multiplies instead of data movement. I'll write a more detailed blog but approximately:

1. Flash Attention v2 reduces the time taken by 17% or so

2. RoPE Triton kernels: -7.1%

3. RMS Layernorm in Triton: -3.1%

4. Cross Entropy in Triton: -1%

5. Manual autograd for MLP: -4%

6. Manual QKV autograd: -2%

7. Manual O autograd: -2%

8. Smart cache evictions and reduced data duplications etc: -30%

9. And other tricks in the Max and Pro versions makes it 30x faster

βœ”οΈ Solution

Would be nice to use their kernels to speedup axolotl

❓ Alternatives

No response

πŸ“ Additional Context

No response

Acknowledgements

danielhanchen commented 11 months ago

If this is a major request by the OSS community - I'm more than happy to include some of the changes from Unsloth!

Peter-Devine commented 11 months ago

I would like to second this request. As I understand it, this is simply free increased efficiency for training with no degradation on accuracy, right? I think this would be a major boost to the Axolotl project.

danielhanchen commented 11 months ago

Yes 0% loss in accuracy - we do actual FLOP reductions via our manual autograd engine. I'm actually working with @casper-hansen and some other Axolotl people to put some methods inside Axolotl!

Peter-Devine commented 11 months ago

Legend. Superman has posters of you on his wall. Thanks so much for all of your work!

danielhanchen commented 11 months ago

:)

casper-hansen commented 11 months ago

I tried a few of the optimizations for FFT on Mistral, but I cannot seem to improve it according to the posts. @danielhanchen would be great if you can pitch in with a PR if you have time.

https://github.com/OpenAccess-AI-Collective/axolotl/tree/unsloth_modules

danielhanchen commented 11 months ago

@casper-hansen Oh cool - I'll have a look! Ye I'll try to make a PR to axolotl!!

fakerybakery commented 11 months ago

Hi, Is there any status on these updates? If I use Axolotl right now, will I benefit from the Unsloth improvements? Thank you!

danielhanchen commented 11 months ago

@fakerybakery Sorry not yet - I'll take a look at the PR Casper made, but it might take some time

fakerybakery commented 11 months ago

Ok, thank you!

kno10 commented 6 months ago

Unsloth is particular interesting if your GPU is not supported by flash attention (e.g., V100). Unfortunately, as of now, unsloth seems to not have multi-GPU support in the OSS version yet: https://github.com/unslothai/unsloth/issues/107

gardner commented 6 months ago

FYI: gradient checkpointing has been merged: https://github.com/OpenAccess-AI-Collective/axolotl/pull/1528 πŸŽ‰