Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.65k stars 1.25k forks source link

Shall Flash-attn support Gemma-2 soft-capping anytime soon ? #1111

Open thusinh1969 opened 2 months ago

thusinh1969 commented 2 months ago

Great product, TriDao (quá giỏi bạn tôi).

Shall your Flash-attn support Gemma-2 soft-capping anytime soon ? We inspired much by Gemma-2 quality and hence we would stick to it should context-length can be expanded. Unsloth has some options but they support only single GPU.

Thanks much bạn, Cheers Nguyên

tridao commented 2 months ago

it's supported

v-lmn commented 2 months ago

it's supported

Algorithm 1 FlashAttention-3 forward pass without intra-consumer overlapping Algorithm 2 FlashAttention-3 consumer warpgroup forward pass

Is the current implementation of the Algorithm 2? Is there an implementation of the Algorithm 1 available? I would like to make a comparison. Could you please provide the complete code of Algorithm 1? thanks!

HuangBugWei commented 2 months ago

Hi @tridao, I recently installed the latest version of FlashAttention using the following command: pip install -U https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl I am using AutoModelForCausalLM from the Hugging Face Transformers library, which I've also upgraded to the latest version. However, I'm still noticing some unexpected results during inference. Before I dive into debugging other parts of my code, I want to confirm whether the version of FlashAttention I've installed is fully compatible with the latest version of Hugging Face Transformers. Could this version mismatch be causing the issues I'm seeing? Thanks for your help!

tridao commented 2 months ago

Idk anything about HF transformers

HuangBugWei commented 2 months ago

Thank you for your response. May I then assume that the version I installed supports the soft capping operation?

v-lmn commented 2 months ago

Idk anything about HF transformers

@tridao Is the current implementation of the Algorithm 2? Is there an implementation of the Algorithm 1 available? I would like to make a comparison. Could you please provide the complete code of Algorithm 1? thanks!