NexaAI / nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
https://docs.nexaai.com/
Apache License 2.0
1.71k stars 240 forks source link

[BUG] `/v1/chat/completions` streaming is broken #141

Open av opened 2 days ago

av commented 2 days ago

Issue Description

From the source, it looks like streaming support is intended, however it doesn't look like it functions in practice. Also it looks like the official install scipt installs 0.0.8.5 whereas 0.0.8.6 is the actual latest.

Steps to Reproduce

Trying to run server with streaming chat completions.

{{host}} is a valid working URL of Nexa Server.

curl --request POST \
  --url http://localhost:34181/v1/chat/completions \
  --header 'Accept: */*' \
  --header 'Authorization: sk-fake' \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: httpyac' \
  --data '{
  "model": "anything",
  "messages": [
    {"role": "user", "content": "How many heads Girrafes have?"}
  ],
  "options": {
    "temperature": 0.2
  },
  "stream": true
}'
curl: (18) transfer closed with outstanding read data remaining
Sending identical request to Ollama Open-AI compatible API

```bash everlier@pop-os:~/code/harbor$ ▼ curl --request POST --url http://localhost:33821/v1/chat/completions --header 'Accept: */*' --header 'Authorization: sk-fake' --header 'Content-Type: application/json' --header 'User-Agent: httpyac' --data '{ "model": "llama3.2:3b-instruct-q4_0", "format": "json", "response_format": { "type": "json_object" }, "messages": [ {"role": "user", "content": "How many heads Girrafes have?"} ], "options": { "temperature": 0.2 }, "stream": true }' data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"{\""},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"It"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"'s"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" actually"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"..."},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" gir"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"aff"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"es"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" do"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" not"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" have"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" multiple"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" heads"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" They"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" have"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" single"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" head"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":","},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" just"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" like"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" most"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" animals"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":","},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" including"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" humans"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" and"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" other"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" mammals"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" It"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"'s"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" possible"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" you"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" may"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" be"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" confusing"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" it"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" with"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \""},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \n\n"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" :\n\n"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" -"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"14"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"4"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\n"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":",\""},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" from"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" classic"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" Disney"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" character"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \""},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\n\n\n"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" :\n\n"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \"-"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"D"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"aff"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"y"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" Duck"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\""},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" "},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\t"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"}"},"finish_reason":null}]} data: {"id":"chatcmpl-200","object":"chat.completion.chunk","created":1727984141,"model":"llama3.2:3b-instruct-q4_0","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]} data: [DONE] ```

OS

Pop!_OS 22.04

Python Version

3.12

Nexa SDK Version

0.0.8.5

GPU (if using one)

4090 Laptop

zhiyuan8 commented 2 days ago

We will fix this in our next release, there is some key matching issue.