Who can give an example by using openai interface?

Arcmoon-Hu commented 2 months ago

Actually, I readed the docs and saw the command is docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda . In my opinion，this command will download relate models and start server. And I also find can use openai style interface by changing config ~/tabby/config.toml. So, I am confused if I change the config, how change the server start command?

swearer23 commented 2 months ago

assuming leave all the cli args empty would do the work. like going:

./tabby serve

fdionisi commented 1 week ago

I would also benefit from a working example of OpenAI interface.

My ~/.tabby/config.toml is as follows:

[model.chat.http]
kind = "openai/chat"
api_endpoint = "https://api.openai.com"
model_name = "gpt-4o"

[model.embedding.http]
kind = "openai/embedding"
api_endpoint = "https://api.openai.com"
model_name = "text-embedding-3-small"

I then run tabby via the homebrew's downloaded executable, with the command tabby serve.

When testing via curl against /v1/chat/completion endpoint, the response header looks correct, but no event is sent from the server. No logs nor errors are printed on tabby side:

curl http://localhost:8080/v1/chat/completions \
  --header 'Authorization: Bearer auth_********************************' \
  --header 'Content-Type: application/json' \
  --data '{
  "max_tokens": 2048,
  "messages": [
    {
      "content": "Hi",
      "role": "user"
    }
  ],
  "model": "gpt-4o",
  "stream": true,
  "temperature": 0
}' -v
*   Trying [::1]:8080...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Authorization: Bearer auth_********************************
> Content-Type: application/json
> Content-Length: 160
>
< HTTP/1.1 200 OK
< content-type: text/event-stream
< cache-control: no-cache
< vary: origin, access-control-request-method, access-control-request-headers
< access-control-allow-origin: *
< access-control-expose-headers: *
< transfer-encoding: chunked
< date: Tue, 10 Sep 2024 09:56:02 GMT
<
* Connection #0 to host localhost left intact

When trying to set "stream": false I get a 500 Internal Server Error, and I can only see a warning on the tabby server side.

Curl:

curl http://localhost:8080/v1/chat/completions \
  --header 'Authorization: Bearer auth_********************************' \
  --header 'Content-Type: application/json' \
  --data '{
  "max_tokens": 2048,
  "messages": [
    {
      "content": "Hi",
      "role": "user"
    }
  ],
  "model": "gpt-4o",
  "stream": false,
  "temperature": 0
}' -v
*   Trying [::1]:8080...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Authorization: Bearer auth_********************************
> Content-Type: application/json
> Content-Length: 161
>
< HTTP/1.1 500 Internal Server Error
< vary: origin, access-control-request-method, access-control-request-headers
< access-control-allow-origin: *
< access-control-expose-headers: *
< content-length: 0
< date: Tue, 10 Sep 2024 10:09:29 GMT
<
* Connection #0 to host localhost left intact

Error on Tabby server:

2024-09-10T10:09:29.947835Z  WARN chat_completions{user=Some("E16n1q")}: tabby::routes::chat: crates/tabby/src/routes/chat.rs:48: Error happens during chat completion: invalid args: When stream is false, use Chat::create

I also tried setting RUST_LOG to get more details on what's going on, but it looks like errors in the HTTP code path are not being logged - please, correct me if I'm being wrong here.

For sanity check, I also tried to call OpenAI directly via curl, and it looks like it's properly working:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "stream": true
  }'
data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"Hi"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" there"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A5sIZ33gNAaFbEjcEfd2HlucGZR4t","object":"chat.completion.chunk","created":1725963727,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}

data: [DONE]

I have the feeling my config file is not correct, but I cannot find any alternative setup from the documentation.

I'd highly appreciate some support in getting the setup right.

Thank you in advance.

My system's info:

tabby: 0.16.1
OS: macOS 14.2.1
Memory: 16 GiB
Architecture: aarch64

wsxiaoys commented 1 week ago

Hi @fdionisi - please check updated configuration example at https://github.com/TabbyML/tabby/blob/main/website/docs/references/models-http-api/openai.md

You need to add /v1 for chat / embedding model endpoint.

TabbyML / tabby

Who can give an example by using openai interface? #2659