TabbyML / tabby

Self-hosted AI coding assistant
https://tabby.tabbyml.com/
Other
20.9k stars 944 forks source link

Reuse llama-server for models supporting both chat / fim completion (e.g Codestral) #2654

Closed wsxiaoys closed 1 day ago

wsxiaoys commented 1 month ago

Please describe the feature you want

Related: https://github.com/TabbyML/tabby/issues/2652

This allows a local deployment could use fewer vram / computing in local setup

Additional context Code location:

  1. https://github.com/TabbyML/tabby/blob/abdb0ef124a453278dcb8366b0e2fcab08201cb6/crates/llama-cpp-server/src/lib.rs#L160-L177
  2. https://github.com/TabbyML/tabby/blob/abdb0ef124a453278dcb8366b0e2fcab08201cb6/crates/llama-cpp-server/src/lib.rs#L179-L195

Please reply with a 👍 if you want this feature.

sirebellum commented 1 month ago

Taking a look at this, just fyi to anyone else looking at this issue. Will update with PR.

sirebellum commented 1 month ago

Hi @wsxiaoys, this is my first time coding in rust. I am well versed in c++ so I don't anticipate any major issues, but I wanted to check with you to see how you anticipated this issue being solved. Was there a specific implementation that you envisioned? Thanks!

wsxiaoys commented 1 week ago

Releasing in 0.16 - If you're interested, you might test it earlier at https://github.com/TabbyML/tabby/releases/tag/v0.16.0-rc.1.

wsxiaoys commented 1 day ago

Released in 0.16