Reuse llama-server for models supporting both chat / fim completion (e.g Codestral)

TabbyML / tabby

Self-hosted AI coding assistant

https://tabby.tabbyml.com/

Other

20.9k stars 944 forks source link

Reuse llama-server for models supporting both chat / fim completion (e.g Codestral) #2654

Closed wsxiaoys closed 1 day ago

wsxiaoys commented 1 month ago

Please describe the feature you want

This allows a local deployment could use fewer vram / computing in local setup

Additional context Code location:

Please reply with a 👍 if you want this feature.

sirebellum commented 1 month ago

Taking a look at this, just fyi to anyone else looking at this issue. Will update with PR.

sirebellum commented 1 month ago

Hi @wsxiaoys, this is my first time coding in rust. I am well versed in c++ so I don't anticipate any major issues, but I wanted to check with you to see how you anticipated this issue being solved. Was there a specific implementation that you envisioned? Thanks!

wsxiaoys commented 1 week ago

Releasing in 0.16 - If you're interested, you might test it earlier at https://github.com/TabbyML/tabby/releases/tag/v0.16.0-rc.1.

wsxiaoys commented 1 day ago

Released in 0.16