datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
https://arxiv.org/pdf/2401.01325.pdf
MIT License
549 stars 54 forks source link

Flash Attention implementation is coming #20

Closed Mooler0410 closed 3 months ago

Mooler0410 commented 4 months ago

We've already had the implementation. Actually,the new results we released on X (previous Twitter) with Google's Gemma are based on this implementation (otherwise we cannot do it on sequences > 30k). However, with the current implementation, we cannot reach the same results(on Longbench)as what we reported in the paper (based on the no flash attention version). There is a minor performance gap between the two versions.

We are still trying to figure out the reason.