Closed tuzeao-tal closed 1 month ago
This problem may come from: in transfromers==4.41, the default Attention module for Llama models are changed from LlamaAttention / LlamaFlashAttention2 to LlamaSdapAttention. Hence, the forward function modification will fail . You may modify these lines: line1, line2 to fix this problem. But we are not sure whether other parts work well with transfromers==4.41. You'd better use transformers==4.38.2 or transformers==4.40.
Seems good. Thanks. I will try it again later.
Hello. I just simplily run the example.py and met the error in the "=====SelfExtend using Torch======" part:
transformers=4.41, flash_attn=2.5.8
Meanwhile, I have noticed the similar problem https://github.com/datamllab/LongLM/issues/31, So I tried setting the attention not be flash attention in the same time:
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, use_flash_attention_2=False)
SelfExtend.apply(model, group_size, window_size, enable_flash_attention=False)
I printed the model, which shows it's not the flash attention:
So where is the problem?