[QUESTION] <title> 422 Unprocessable Entity whisper

NexaAI / nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

Apache License 2.0

4.25k stars 626 forks source link

Question or Issue

I installed the nexa by command from README:

CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir

Then I started server:

nexa server faster-whisper-large-turbo

and it successfully run on 0.0.0.0/8000

I send post request to http://127.0.0.1:8000/v1/audio/transcriptions with body with adsolute path

{
  "file": "sounds/audio_temp.mp3"
}

And I got from request:

{
  "detail": [
    {
      "type": "missing",
      "loc": [
        "body",
        "file"
      ],
      "msg": "Field required",
      "input": null
    }
  ]
}

And from server: 127.0.0.1:53207 - "POST /v1/audio/transcriptions HTTP/1.1" 422 Unprocessable Entity

It is same if i send it to 0.0.0.0 or localhost.

OS

MacOs Sonoma

Python Version

3.10.15

Nexa SDK Version

0.0.9.0

GPU (if using one)

Apple M3

import requests with open("sounds/audio_temp.mp3", "rb") as audio_file: files = { "file": audio_file } # Send the POST request response = requests.post("http://127.0.0.1:8000/v1/audio/transcriptions", files=files)

NexaAI / nexa-sdk