Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
16.75k stars 830 forks source link

Support Flash Attention in server mode #478

Open d-z-m opened 3 days ago

d-z-m commented 3 days ago

llama.cpp upstream has support for -fa in server. I noticed llamafile only has support for this option in CLI mode.

I propose it is added to server mode as well so we server mode users can reap the benefits.

jart commented 1 day ago

This issue has been fixed in https://github.com/Mozilla-Ocho/llamafile/commit/4aea6060b202c2f17f393640ce8b7689ef6412b9

The fix has been incorporated into the recent llamafile v0.8.8 release.