Open wapleeeeee opened 2 days ago
Hi @wapleeeeee, Thanks for trying Tabby.
May I know your interest in using the Minicpm3-4B
with Tabby?
As for the update of llama cpp server, we will generally update it to the newer version with tabby, we are currently working on v0.19.0
release, and I believe we can handle this update in the next release if necessary.
Please also notice that Tabby supports Model HTTP API, you can manually setup a llama cpp server or ollama server and connect to it by Model HTTP API, for more information, please refer to the doc https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/
Thanks so much for your reply!
Actually, we are going to using Minicpm3-4B for our product. Before that, we should test the coding ability of Minicpm3-4B. We found tabby is a great tool which can both help us coding and test for the potential risk.
I use the Model HTTP API successfully with vllm. Thanks for your advice. But now there's a situation, vllm can't accept {"input": {"prefix": xx, "suffix": xx}} format for 'v1/completion' ('v1/chat/completion' do work).
I tried to modify the ~/.tabby/config.toml but it seemed not work. Is there any way to solve that?
Here's my request:
curl -X 'POST' -H "Authorization:Bearer token-abc123"\
'http://localhost:8015/v1/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"language": "python",
"segments": {
"prefix": "def fib(n):\n ",
"suffix": "\n return fib(n - 1) + fib(n - 2)"
}
}'
Here is my error:
{"object":"error","message":"[{'type': 'missing', 'loc': ('body', 'model'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n ', 'suffix': '\\n return fib(n - 1) + fib(n - 2)'}}}, {'type': 'missing', 'loc': ('body', 'prompt'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n ', 'suffix': '\\n return fib(n - 1) + fib(n - 2)'}}}, {'type': 'extra_forbidden', 'loc': ('body', 'language'), 'msg': 'Extra inputs are not permitted', 'input': 'python'}, {'type': 'extra_forbidden', 'loc': ('body', 'segments'), 'msg': 'Extra inputs are not permitted', 'input': {'prefix': 'def fib(n):\\n ', 'suffix': '\\n return fib(n - 1) + fib(n - 2)'}}]","type":"BadRequestError","param":null,"code":400}
I set the ~/.tabby/config.toml with
[model.completion.http]
kind = "openai/completion"
model_name = "minicpm3-4b"
api_endpoint = "http://localhost:8015/v1"
api_key = "xxx"
max_tokens = 256
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"
[model.chat.http]
but the prompt_template seems not work.
I check the request from vllm server found the request is:
b'{"model":"minicpm3-4b","prompt":"def fibonacci(n):\\n if n >=","suffix":" 1:","max_tokens":64,"temperature":0.1,"stream":true,"presence_penalty":0.0}'
The "suffix" param causes the 400 Bad Request.
I recheck the document completion part but there is not any cases or Instructions.
How can I solve this?
Hi @wapleeeeee, it's great that Tabby can help!
We have looked into the inference backend support, and found out that vLLM claims it's OpenAI compatible, but actually, it does not implement suffix
field support.
The OpenAI completion kind is marked as legacy from OpenAI and different services have their own implementation, maybe we have to look deeper into the implementation of OpenAI completion kind, and figure out a solution for it.
I also noticed that you created a discussion about this, let's leave this issue to the update of llama.cpp and discuss the API support here https://github.com/TabbyML/tabby/discussions/3323
Please describe the feature you want
I want to apply local model 【Minicpm3-4B】 for test. But the error appeared:
llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:110: <chat>: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'minicpm3'
The latest version of tabby only support llama.cpp @ 5ef07e2 whose last update is 2 months ago.
I've noticed there was a PR at llama.cpp last month: https://github.com/ggerganov/llama.cpp/pull/9322.
I wonder if you can update the llama.cpp version at the next version of tabby.