NexaAI / nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
https://docs.nexa.ai/
Apache License 2.0
3.62k stars 534 forks source link

server #79

Closed ayttop closed 2 months ago

ayttop commented 2 months ago

how to use server?

ayttop commented 2 months ago

how to use server with curl?

JoyboyBrian commented 2 months ago
curl -X 'POST' \
  'http://localhost:8000/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "Tell me a story",
  "temperature": 1,
  "max_new_tokens": 128,
  "top_k": 50,
  "top_p": 1,
  "stop_words": [
    "string"
  ]
}'
ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2>curl -X POST "http://localhost:8000/v1/completions" ^ More? -H "accept: application/json" ^ More? -H "Content-Type: application/json" ^ More? -d "{\"prompt\": \"Tell me a story\", \"temperature\": 1, \"max_new_tokens\": 128, \"top_k\": 50, \"top_p\": 1, \"stop_words\": [\"string\"]}" {"detail":"'text'"} (2) C:\Users\ArabTech\Desktop\2>

ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2>curl -X POST "http://localhost:8000/v1/completions" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\": \"Tell me a story\", \"temperature\": 1, \"max_new_tokens\": 128, \"top_k\": 50, \"top_p\": 1, \"stop_words\": [\"string\"]}" {"detail":"'text'"}

ayttop commented 2 months ago

not run

ayttop commented 2 months ago

win11 anaconda python 3.11

ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2>curl -X POST "http://localhost:8000/v1/completions" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"text\": \"Tell me a story\", \"temperature\": 1, \"max_new_tokens\": 128, \"top_k\": 50, \"top_p\": 1, \"stop_words\": [\"string\"]}" {"detail":"'text'"} (2) C:\Users\ArabTech\Desktop\2>curl -X POST "http://localhost:8000/v1/completions" -H "Content-Type: application/json" -d "{\"text\": \"Tell me a story\"}" {"detail":"'text'"} (2) C:\Users\ArabTech\Desktop\2>

ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2\nexa-sdk>nexa server --host 127.0.0.1 --port 8000 Phi-2:q4_0 INFO: Started server process [5432] INFO: Waiting for application startup. 2024-09-09 17:43:43,550 - INFO - Model Path: Phi-2:q4_0 Model Phi-2:q4_0 already exists at C:\Users\ArabTech.cache\nexa\hub\official\Phi-2\q4_0.gguf 2024-09-09 17:43:44,213 - INFO - model loaded as <nexa.gguf.llama.llama.Llama object at 0x0000015396476210> INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:57924 - "GET /docs HTTP/1.1" 200 OK INFO: 127.0.0.1:57924 - "GET /openapi.json HTTP/1.1" 200 OK 2024-09-09 17:44:28,514 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:57925 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 17:45:18,081 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:57948 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 17:45:35,981 - ERROR - Error in chat completions: 'text' INFO: 127.0.0.1:57954 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

{"detail":"'text'"}

ayttop commented 2 months ago

response_1725929336990.json

ayttop commented 2 months ago

{ "detail": "'text'" }

ayttop commented 2 months ago

![Uploading Screenshot 2024-09-09 182036.png…]()

ayttop commented 2 months ago

Screenshot 2024-09-09 182036

ayttop commented 2 months ago

posrtman

ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2>nexa server --host 127.0.0.1 --port 8000 Phi-2:q4_0 INFO: Started server process [25424] INFO: Waiting for application startup. 2024-09-09 18:13:58,307 - INFO - Model Path: Phi-2:q4_0 Model Phi-2:q4_0 already exists at C:\Users\ArabTech.cache\nexa\hub\official\Phi-2\q4_0.gguf 2024-09-09 18:13:58,945 - INFO - model loaded as <nexa.gguf.llama.llama.Llama object at 0x0000022FB3901A90> INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:59196 - "GET /docs HTTP/1.1" 200 OK INFO: 127.0.0.1:59196 - "GET /openapi.json HTTP/1.1" 200 OK 2024-09-09 18:14:24,686 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:59197 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 18:20:09,358 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:59270 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2>nexa run gemma-2b:q2_K Downloading gemma-2b/q2_K.gguf... q2_K.gguf: 0%|▏ | 3.53M/1.32G [00:01<10:29, 2.24MiB/s] An error occurred while downloading or processing the model: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2580) Failed to pull model gemma-2b:q2_K Unknown task: UNKNOWN. Skipping inference.

(2) C:\Users\ArabTech\Desktop\2>

ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2>nexa run gemma-2b:q4_K_M Model gemma-2b:q4_K_M already exists at C:\Users\ArabTech.cache\nexa\hub\official\gemma-2b\q4_K_M.gguf

k 2024-09-09 18:48:02,962 - ERROR - Error during generation: 'text' Traceback (most recent call last): File "C:\Users\ArabTech\Desktop\2\nexa-sdk\nexa\gguf\nexa_inference_text.py", line 173, in run delta = chunk["choices"][0]["text"]


KeyError: 'text'

Send
ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2>nexa server --host 127.0.0.1 --port 8000 Meta-Llama-3-8B-Instruct:q2_K INFO: Started server process [4512] INFO: Waiting for application startup. 2024-09-09 19:11:50,509 - INFO - Model Path: Meta-Llama-3-8B-Instruct:q2_K Model Meta-Llama-3-8B-Instruct:q2_K already exists at C:\Users\ArabTech.cache\nexa\hub\official\Meta-Llama-3-8B-Instruct\q2_K.gguf 2024-09-09 19:11:50,931 - INFO - model loaded as <nexa.gguf.llama.llama.Llama object at 0x000002158396A450> INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:51991 - "GET /docs HTTP/1.1" 200 OK INFO: 127.0.0.1:51991 - "GET /openapi.json HTTP/1.1" 200 OK 2024-09-09 19:12:25,168 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:51992 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:12:39,967 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:52001 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:13:04,016 - ERROR - Error in chat completions: 'text' INFO: 127.0.0.1:52008 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

ayttop commented 2 months ago

{ "detail": "'text'" }

ayttop commented 2 months ago

(2) C:\Users\ArabTech\Desktop\2> nexa server Meta-Llama-3-8B-Instruct:q2_K INFO: Started server process [12260] INFO: Waiting for application startup. 2024-09-09 19:18:20,635 - INFO - Model Path: Meta-Llama-3-8B-Instruct:q2_K Model Meta-Llama-3-8B-Instruct:q2_K already exists at C:\Users\ArabTech.cache\nexa\hub\official\Meta-Llama-3-8B-Instruct\q2_K.gguf 2024-09-09 19:18:21,057 - INFO - model loaded as <nexa.gguf.llama.llama.Llama object at 0x000001C6EBD23950> INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:52197 - "GET /docs HTTP/1.1" 200 OK INFO: 127.0.0.1:52197 - "GET /openapi.json HTTP/1.1" 200 OK 2024-09-09 19:19:04,015 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:52198 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:19:50,832 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:52208 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:19:59,782 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:52210 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:20:35,015 - ERROR - Error in chat completions: 'text' INFO: 127.0.0.1:52211 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:21:15,760 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:52297 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:21:33,361 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:52300 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error 2024-09-09 19:22:24,176 - ERROR - Error in text generation: 'text' INFO: 127.0.0.1:52304 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

ayttop commented 2 months ago

I tried on other models and tried on a website http://127.0.0.1:8000/docs#/default/chat_completions_v1_chat_completions_post and a program postman and all attempts failed.

ayttop commented 2 months ago

t

Parameters

No parameters

Request body

Servers

These operation-level options override the global server options.

Responses

Curl

curl -X 'POST' \
  'http://127.0.0.1:8000/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "Tell me a story",
  "temperature": 1,
  "max_new_tokens": 128,
  "top_k": 50,
  "top_p": 1,
  "stop_words": [
    "string"
  ]
}'

Request URL

http://127.0.0.1:8000/v1/completions

Server response

Code | Details -- | -- 500Undocumented | Error: Internal Server ErrorResponse bodyDownload{ "detail": "'text'" }Response headers access-control-allow-credentials: true access-control-allow-origin: http://127.0.0.1 content-length: 19 content-type: application/json date: Tue,10 Sep 2024 02:25:40 GMT server: uvicorn vary: Origin
[t](http://127.0.0.1:8000/docs#/default) GET [/](http://127.0.0.1:8000/docs#/default/read_root__get) Read Root POST [/v1/completions](http://127.0.0.1:8000/docs#/default/generate_text_v1_completions_post) Generate Text Parameters Cancel No parameters Request body application/json { "prompt": "Tell me a story", "temperature": 1, "max_new_tokens": 128, "top_k": 50, "top_p": 1, "stop_words": [ "string" ] } Servers These operation-level options override the global server options. / Execute Clear Responses Curl curl -X 'POST' \ 'http://127.0.0.1:8000/v1/completions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Tell me a story", "temperature": 1, "max_new_tokens": 128, "top_k": 50, "top_p": 1, "stop_words": [ "string" ] }' Request URL http://127.0.0.1:8000/v1/completions Server response Code Details 500 Undocumented Error: Internal Server Error Response body Download { "detail": "'text'" } Response headers access-control-allow-credentials: true access-control-allow-origin: http://127.0.0.1 content-length: 19 content-type: application/json date: Tue,10 Sep 2024 02:25:40 GMT server: uvicorn vary: Origin Responses Code Description
ayttop commented 2 months ago

it is run itis work

(2) C:\Users\ArabTech\Desktop\2\nexa-sdk>nexa server gemma INFO: Started server process [16584] INFO: Waiting for application startup. 2024-09-10 10:42:17,166 - INFO - Model Path: gemma Downloading gemma-1.1-2b-instruct/q4_0.gguf... q4_0.gguf: 100%|█████████████████████████████████████████████████████████████████| 1.44G/1.44G [07:24<00:00, 3.49MiB/s] Successfully downloaded gemma-1.1-2b-instruct/q4_0.gguf to C:\Users\ArabTech.cache\nexa\hub\official\gemma-1.1-2b-instruct\q4_0.gguf Successfully pulled model gemma-1.1-2b-instruct:q4_0 to C:\Users\ArabTech.cache\nexa\hub\official\gemma-1.1-2b-instruct\q4_0.gguf, run_type: NLP 2024-09-10 10:49:46,544 - INFO - model loaded as <nexa.gguf.llama.llama.Llama object at 0x000001B772B23210> INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:53721 - "POST /v1/completions HTTP/1.1" 200 OK INFO: 127.0.0.1:53806 - "POST /v1/completions HTTP/1.1" 200 OK

postman

post http://0.0.0.0:8000/v1/completions body row json { "prompt": "Tell me a story", "temperature": 1, "max_new_tokens": 128, "top_k": 50, "top_p": 1, "stop_words": ["string"] }

send

prety { "result": "In a hidden valley nestled between towering mountains, there lived a brave princess named Anya. Her kingdom was under a cruel spell, and the once-vibrant land was now shrouded in an eerie gloom.\n\nOne fateful night, Anya heard a haunting melody carried on the wind. Curious, she followed the sound to a secluded grove. There, she found an ancient oak tree that whispered secrets in a language unknown to human ears.\n\nAs Anya listened intently, she discovered a tale of a forgotten curse and a magical artifact that could break the spell. Guided by the tree's wisdom, Anya embarked on a perilous quest through treacherous forests and sparkling" }

ayttop commented 2 months ago

thank you

ayttop commented 2 months ago

Briannotifications@github.com ​ NexaAI/nexa-sdk ​ ​thank you

ayttop commented 2 months ago

{ "prompt": "Tell me a story", "temperature": 1, "max_new_tokens": 128, "top_k": 50, "top_p": 1, "stop_words": [ "string" ] }

http://0.0.0.0:8000/v1/completions

{ "prompt": "Tell me a story", "temperature": 1, "max_new_tokens": 128, "top_k": 50, "top_p": 1, "stop_words": [ "string" ] }

{ "result": "The wind whispered secrets through the ancient oak tree, rustling its leaves and sending shivers down its wrinkled bark. Beneath its watchful gaze, a young girl sat nestled in a woven mat, her eyes closed, her mind lost in a world of her own creation.\n\nHer story began with a tear, a silent spill on the fabric of her world. The world she built was a canvas of vibrant colors, filled with talking animals, sparkling rivers, and towering mountains. With each breath, her story grew, expanding beyond the boundaries of the mat.\n\nThe wind became her pen, guiding her words across the page. She spun tales of courage" }

thank you but how run curl -X 'POST' \ 'http://localhost:8000/v1/completions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Tell me a story", "temperature": 1, "max_new_tokens": 128, "top_k": 50, "top_p": 1, "stop_words": [ "string" ] }'

on cmd win11

ayttop commented 2 months ago

on postman

http://0.0.0.0:8000/v1/completions

{ "prompt": "Tell me a story", "temperature": 1, "max_new_tokens": 128, "top_k": 50, "top_p": 1, "stop_words": [ "string" ] }

{ "result": "The wind whispered secrets through the ancient oak tree, rustling its leaves and sending shivers down its wrinkled bark. Beneath its watchful gaze, a young girl sat nestled in a woven mat, her eyes closed, her mind lost in a world of her own creation.\n\nHer story began with a tear, a silent spill on the fabric of her world. The world she built was a canvas of vibrant colors, filled with talking animals, sparkling rivers, and towering mountains. With each breath, her story grew, expanding beyond the boundaries of the mat.\n\nThe wind became her pen, guiding her words across the page. She spun tales of courage" }

JoyboyBrian commented 2 months ago

Hi @ayttop, Here is how to execute curl command on Windows: Step 1. Open Command Prompt:

  • Press Win + R, type cmd, and press Enter.
  • Alternatively, you can press Win + X and select Command Prompt or Windows Terminal.

Step 2. Run the curl command: Copy and paste the following command into the Command Prompt. For Windows, ensure that the JSON data in the -d flag is properly formatted:

curl -X POST "http://localhost:8000/v1/completions" ^
  -H "accept: application/json" ^
  -H "Content-Type: application/json" ^
  -d "{ \"prompt\": \"Tell me a story\", \"temperature\": 1, \"max_new_tokens\": 128, \"top_k\": 50, \"top_p\": 1, \"stop_words\": [ \"string\" ] }"
  • The caret ^ is used for multi-line commands in Windows
  • Alternatively, you can put everything on one line:
    curl -X POST "http://localhost:8000/v1/completions" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"prompt\": \"Tell me a story\", \"temperature\": 1, \"max_new_tokens\": 128, \"top_k\": 50, \"top_p\": 1, \"stop_words\": [ \"string\" ] }"

image