Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
19.39k stars 982 forks source link

Question #363

Closed fakerybakery closed 5 months ago

fakerybakery commented 5 months ago

Hi, thank you for releasing llamafile! The speedups are very impressive. Are there any plans to merge the improvements made here upstream into the llama.cpp repo? Thanks!

twitchyliquid64 commented 5 months ago

This was asked on twitter, Justine's answer was yes its in flight: https://twitter.com/JustineTunney/status/1783332508505194674

fakerybakery commented 5 months ago

Thanks! (That was me on Twitter :) - but Llamafile is still much faster than llama.cpp. Does that mean not all improvements have been upstreamed?

jart commented 5 months ago

Yes, we're happy to share our optimizations with llama.cpp. Here's two PRs I sent them, which got merged:

The reason why llamafile still continues to be faster than llama.cpp is because I've discovered even more performance opportunities in last few weeks since I published my blog https://justine.lol/matmul/ My latest tricks will be upstreamed too, however they're still awaiting approval.