Open Arcmoon-Hu opened 2 months ago
assuming leave all the cli args empty would do the work. like going:
./tabby serve
I would also benefit from a working example of OpenAI interface.
My ~/.tabby/config.toml
is as follows:
[model.chat.http]
kind = "openai/chat"
api_endpoint = "https://api.openai.com"
model_name = "gpt-4o"
[model.embedding.http]
kind = "openai/embedding"
api_endpoint = "https://api.openai.com"
model_name = "text-embedding-3-small"
I then run tabby via the homebrew's downloaded executable, with the command tabby serve
.
When testing via curl against /v1/chat/completion
endpoint, the response header looks correct, but no event is sent from the server. No logs nor errors are printed on tabby side:
curl http://localhost:8080/v1/chat/completions \
--header 'Authorization: Bearer auth_********************************' \
--header 'Content-Type: application/json' \
--data '{
"max_tokens": 2048,
"messages": [
{
"content": "Hi",
"role": "user"
}
],
"model": "gpt-4o",
"stream": true,
"temperature": 0
}' -v
* Trying [::1]:8080...
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Authorization: Bearer auth_********************************
> Content-Type: application/json
> Content-Length: 160
>
< HTTP/1.1 200 OK
< content-type: text/event-stream
< cache-control: no-cache
< vary: origin, access-control-request-method, access-control-request-headers
< access-control-allow-origin: *
< access-control-expose-headers: *
< transfer-encoding: chunked
< date: Tue, 10 Sep 2024 09:56:02 GMT
<
* Connection #0 to host localhost left intact
When trying to set "stream": false
I get a 500 Internal Server Error, and I can only see a warning on the tabby server side.
Curl:
curl http://localhost:8080/v1/chat/completions \
--header 'Authorization: Bearer auth_********************************' \
--header 'Content-Type: application/json' \
--data '{
"max_tokens": 2048,
"messages": [
{
"content": "Hi",
"role": "user"
}
],
"model": "gpt-4o",
"stream": false,
"temperature": 0
}' -v
* Trying [::1]:8080...
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Authorization: Bearer auth_********************************
> Content-Type: application/json
> Content-Length: 161
>
< HTTP/1.1 500 Internal Server Error
< vary: origin, access-control-request-method, access-control-request-headers
< access-control-allow-origin: *
< access-control-expose-headers: *
< content-length: 0
< date: Tue, 10 Sep 2024 10:09:29 GMT
<
* Connection #0 to host localhost left intact
Error on Tabby server:
2024-09-10T10:09:29.947835Z WARN chat_completions{user=Some("E16n1q")}: tabby::routes::chat: crates/tabby/src/routes/chat.rs:48: Error happens during chat completion: invalid args: When stream is false, use Chat::create
I also tried setting RUST_LOG
to get more details on what's going on, but it looks like errors in the HTTP code path are not being logged - please, correct me if I'm being wrong here.
For sanity check, I also tried to call OpenAI directly via curl, and it looks like it's properly working:
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
],
"stream": true
}'
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"Hi"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" there"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
data: [DONE]
I have the feeling my config file is not correct, but I cannot find any alternative setup from the documentation.
I'd highly appreciate some support in getting the setup right.
Thank you in advance.
My system's info:
tabby: 0.16.1
OS: macOS 14.2.1
Memory: 16 GiB
Architecture: aarch64
Hi @fdionisi - please check updated configuration example at https://github.com/TabbyML/tabby/blob/main/website/docs/references/models-http-api/openai.md
You need to add /v1
for chat / embedding model endpoint.
Actually, I readed the docs and saw the command is
docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
. In my opinion,this command will download relate models and start server. And I also find can use openai style interface by changing config~/tabby/config.toml
. So, I am confused if I change the config, how change the server start command?