Need to update the version of llama.cpp

wapleeeeee commented 2 days ago

Please describe the feature you want

I want to apply local model 【Minicpm3-4B】 for test. But the error appeared: llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:110: <chat>: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'minicpm3'

The latest version of tabby only support llama.cpp @ 5ef07e2 whose last update is 2 months ago.

I've noticed there was a PR at llama.cpp last month: https://github.com/ggerganov/llama.cpp/pull/9322.

I wonder if you can update the llama.cpp version at the next version of tabby.

zwpaper commented 2 days ago

Hi @wapleeeeee, Thanks for trying Tabby.

May I know your interest in using the Minicpm3-4B with Tabby?

As for the update of llama cpp server, we will generally update it to the newer version with tabby, we are currently working on v0.19.0 release, and I believe we can handle this update in the next release if necessary.

Please also notice that Tabby supports Model HTTP API, you can manually setup a llama cpp server or ollama server and connect to it by Model HTTP API, for more information, please refer to the doc https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/

wapleeeeee commented 1 day ago

Thanks so much for your reply!

Actually, we are going to using Minicpm3-4B for our product. Before that, we should test the coding ability of Minicpm3-4B. We found tabby is a great tool which can both help us coding and test for the potential risk.

I use the Model HTTP API successfully with vllm. Thanks for your advice. But now there's a situation, vllm can't accept {"input": {"prefix": xx, "suffix": xx}} format for 'v1/completion' ('v1/chat/completion' do work).

I tried to modify the ~/.tabby/config.toml but it seemed not work. Is there any way to solve that?

Here's my request:

curl -X 'POST' -H "Authorization:Bearer token-abc123"\
  'http://localhost:8015/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
  "segments": {
    "prefix": "def fib(n):\n    ",
    "suffix": "\n        return fib(n - 1) + fib(n - 2)"
  }
}'

Here is my error:

{"object":"error","message":"[{'type': 'missing', 'loc': ('body', 'model'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}}, {'type': 'missing', 'loc': ('body', 'prompt'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}}, {'type': 'extra_forbidden', 'loc': ('body', 'language'), 'msg': 'Extra inputs are not permitted', 'input': 'python'}, {'type': 'extra_forbidden', 'loc': ('body', 'segments'), 'msg': 'Extra inputs are not permitted', 'input': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}]","type":"BadRequestError","param":null,"code":400}

wapleeeeee commented 23 hours ago

I set the ~/.tabby/config.toml with

[model.completion.http]
kind = "openai/completion"
model_name = "minicpm3-4b"
api_endpoint = "http://localhost:8015/v1"
api_key = "xxx"
max_tokens = 256
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"

[model.chat.http]

but the prompt_template seems not work.

I check the request from vllm server found the request is:

b'{"model":"minicpm3-4b","prompt":"def fibonacci(n):\\n    if n >=","suffix":" 1:","max_tokens":64,"temperature":0.1,"stream":true,"presence_penalty":0.0}'

The "suffix" param causes the 400 Bad Request.

I recheck the document completion part but there is not any cases or Instructions.

How can I solve this?

zwpaper commented 15 hours ago

Hi @wapleeeeee, it's great that Tabby can help!

We have looked into the inference backend support, and found out that vLLM claims it's OpenAI compatible, but actually, it does not implement suffix field support.

The OpenAI completion kind is marked as legacy from OpenAI and different services have their own implementation, maybe we have to look deeper into the implementation of OpenAI completion kind, and figure out a solution for it.

I also noticed that you created a discussion about this, let's leave this issue to the update of llama.cpp and discuss the API support here https://github.com/TabbyML/tabby/discussions/3323

TabbyML / tabby

Need to update the version of llama.cpp #3305