Apply unsloth optimizations - Githubissues

axolotl-ai-cloud / axolotl

Go ahead and axolotl questions

https://axolotl-ai-cloud.github.io/axolotl/

Apache License 2.0

7.87k stars 866 forks source link

Apply unsloth optimizations #908

Open bratao opened 11 months ago

bratao commented 11 months ago

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

The project https://github.com/unslothai/unsloth looks very interesting. He claims great speedups for finetuning. He detail the improvements here:

So in GPUs the goal is to saturate the GPU with matrix multiplies instead of data movement. I'll write a more detailed blog but approximately:

1. Flash Attention v2 reduces the time taken by 17% or so

2. RoPE Triton kernels: -7.1%

3. RMS Layernorm in Triton: -3.1%

4. Cross Entropy in Triton: -1%

5. Manual autograd for MLP: -4%

6. Manual QKV autograd: -2%

7. Manual O autograd: -2%

8. Smart cache evictions and reduced data duplications etc: -30%

9. And other tricks in the Max and Pro versions makes it 30x faster

✔️ Solution

Would be nice to use their kernels to speedup axolotl

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

danielhanchen commented 11 months ago

If this is a major request by the OSS community - I'm more than happy to include some of the changes from Unsloth!

Peter-Devine commented 11 months ago

I would like to second this request. As I understand it, this is simply free increased efficiency for training with no degradation on accuracy, right? I think this would be a major boost to the Axolotl project.

danielhanchen commented 11 months ago

Yes 0% loss in accuracy - we do actual FLOP reductions via our manual autograd engine. I'm actually working with @casper-hansen and some other Axolotl people to put some methods inside Axolotl!

Peter-Devine commented 11 months ago

Legend. Superman has posters of you on his wall. Thanks so much for all of your work!

danielhanchen commented 11 months ago

:)

casper-hansen commented 11 months ago

I tried a few of the optimizations for FFT on Mistral, but I cannot seem to improve it according to the posts. @danielhanchen would be great if you can pitch in with a PR if you have time.

https://github.com/OpenAccess-AI-Collective/axolotl/tree/unsloth_modules

danielhanchen commented 11 months ago

@casper-hansen Oh cool - I'll have a look! Ye I'll try to make a PR to axolotl!!

fakerybakery commented 11 months ago

Hi, Is there any status on these updates? If I use Axolotl right now, will I benefit from the Unsloth improvements? Thank you!

danielhanchen commented 11 months ago

@fakerybakery Sorry not yet - I'll take a look at the PR Casper made, but it might take some time

fakerybakery commented 11 months ago

Ok, thank you!

kno10 commented 6 months ago

Unsloth is particular interesting if your GPU is not supported by flash attention (e.g., V100). Unfortunately, as of now, unsloth seems to not have multi-GPU support in the OSS version yet: https://github.com/unslothai/unsloth/issues/107

gardner commented 6 months ago

FYI: gradient checkpointing has been merged: https://github.com/OpenAccess-AI-Collective/axolotl/pull/1528 🎉