InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.15k stars 281 forks source link

fix falcon attention #1761

Closed grimoire closed 2 weeks ago

grimoire commented 3 weeks ago

falcon-7b has 71 heads, leads to attention kernel error.