NexaAI / nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
https://docs.nexaai.com/
Apache License 2.0
693 stars 92 forks source link
asr edge-computing language-model llm on-device-ai on-device-ml sdk sdk-python stable-diffusion transformers tts vlm whisper

Nexa SDK

icon [![MacOS][MacOS-image]][release-url] [![Linux][Linux-image]][release-url] [![Windows][Windows-image]][release-url] [![GitHub Release](https://img.shields.io/github/v/release/NexaAI/nexa-sdk)](https://github.com/NexaAI/nexa-sdk/releases/latest) [![Build workflow](https://img.shields.io/github/actions/workflow/status/NexaAI/nexa-sdk/ci.yaml?label=CI&logo=github)](https://github.com/NexaAI/nexa-sdk/actions/workflows/ci.yaml?query=branch%3Amain) ![GitHub License](https://img.shields.io/github/license/NexaAI/nexa-sdk) [![Discord](https://dcbadge.limes.pink/api/server/thRu2HaK4D?style=flat&compact=true)](https://discord.gg/thRu2HaK4D) [On-device Model Hub](https://model-hub.nexa4ai.com/) / [Nexa SDK Documentation](https://docs.nexaai.com/) [release-url]: https://github.com/NexaAI/nexa-sdk/releases [Windows-image]: https://img.shields.io/badge/windows-0078D4?logo=windows [MacOS-image]: https://img.shields.io/badge/-MacOS-black?logo=apple [Linux-image]: https://img.shields.io/badge/-Linux-333?logo=ubuntu

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for function calling and streaming support, and a user-friendly Streamlit UI. Users can run Nexa SDK in any device with Python environment, and GPU acceleration is supported.

Features

Detailed API documentation is available here.

Below is our differentiation from other similar tools:

Feature Nexa SDK ollama Optimum LM Studio
GGML Support
ONNX Support
Text Generation
Image Generation
Vision-Language Models
Text-to-Speech
Server Capability
User Interface

Installation

We have released pre-built wheels for various Python versions, platforms, and backends for convenient installation on our index page.

[!NOTE]

  1. If you want to use ONNX model, just replace pip install nexaai with pip install "nexaai[onnx]" in provided commands.
  2. For Chinese developers, we recommend you to use Tsinghua Open Source Mirror as extra index url, just replace --extra-index-url https://pypi.org/simple with --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple in provided commands.

CPU

pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cpu --extra-index-url https://pypi.org/simple --no-cache-dir

GPU (Metal)

For the GPU version supporting Metal (macOS):

CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir
FAQ: cannot use Metal/GPU on M1 Try the following command: ```bash wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh bash Miniforge3-MacOSX-arm64.sh conda create -n nexasdk python=3.10 conda activate nexasdk CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir ```

GPU (CUDA)

For Linux:

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows PowerShell:

$env:CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"; pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows Command Prompt:

set CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" & pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows Git Bash:

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
FAQ: Building Issues for llava If you encounter the following issue while building: ![](docs/.media/error.jpeg) try the following command: ```bash CMAKE_ARGS="-DCMAKE_CXX_FLAGS=-fopenmp" pip install nexaai ```

GPU (ROCm)

For Linux:

CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/rocm602 --extra-index-url https://pypi.org/simple --no-cache-dir

Local Build

How to clone this repo

git clone --recursive https://github.com/NexaAI/nexa-sdk

If you forget to use --recursive, you can use below command to add submodule

git submodule update --init --recursive

Then you can build and install the package

pip install -e .

Docker Usage

Note: Docker doesn't support GPU acceleration

docker pull nexa4ai/sdk:latest

replace following placeholder with your path and command

docker run -v <your_model_dir>:/model -it nexa4ai/sdk:latest [nexa_command] [your_model_relative_path]

Example:

docker run -v /home/ubuntu/.cache/nexa/hub/official:/model -it nexa4ai/sdk:latest nexa gen-text /model/Phi-3-mini-128k-instruct/q4_0.gguf

will create an interactive session with text generation

Supported Models

Model Type Format Command
octopus-v2 NLP GGUF nexa run octopus-v2
octopus-v4 NLP GGUF nexa run octopus-v4
tinyllama NLP GGUF nexa run tinyllama
llama2 NLP GGUF/ONNX nexa run llama2
llama3 NLP GGUF/ONNX nexa run llama3
llama3.1 NLP GGUF/ONNX nexa run llama3.1
gemma NLP GGUF/ONNX nexa run gemma
gemma2 NLP GGUF nexa run gemma2
qwen1.5 NLP GGUF nexa run qwen1.5
qwen2 NLP GGUF/ONNX nexa run qwen2
qwen2.5 NLP GGUF nexa run qwen2.5
mathqwen NLP GGUF nexa run mathqwen
mistral NLP GGUF/ONNX nexa run mistral
codegemma NLP GGUF nexa run codegemma
codellama NLP GGUF nexa run codellama
codeqwen NLP GGUF nexa run codeqwen
deepseek-coder NLP GGUF nexa run deepseek-coder
dolphin-mistral NLP GGUF nexa run dolphin-mistral
phi2 NLP GGUF nexa run phi2
phi3 NLP GGUF/ONNX nexa run phi3
llama2-uncensored NLP GGUF nexa run llama2-uncensored
llama3-uncensored NLP GGUF nexa run llama3-uncensored
llama2-function-calling NLP GGUF nexa run llama2-function-calling
nanollava Multimodal GGUF nexa run nanollava
llava-phi3 Multimodal GGUF nexa run llava-phi3
llava-llama3 Multimodal GGUF nexa run llava-llama3
llava1.6-mistral Multimodal GGUF nexa run llava1.6-mistral
llava1.6-vicuna Multimodal GGUF nexa run llava1.6-vicuna
stable-diffusion-v1-4 Computer Vision GGUF nexa run sd1-4
stable-diffusion-v1-5 Computer Vision GGUF/ONNX nexa run sd1-5
lcm-dreamshaper Computer Vision GGUF/ONNX nexa run lcm-dreamshaper
hassaku-lcm Computer Vision GGUF nexa run hassaku-lcm
anything-lcm Computer Vision GGUF nexa run anything-lcm
faster-whisper-tiny Audio BIN nexa run faster-whisper-tiny
faster-whisper-small Audio BIN nexa run faster-whisper-small
faster-whisper-medium Audio BIN nexa run faster-whisper-medium
faster-whisper-base Audio BIN nexa run faster-whisper-base
faster-whisper-large Audio BIN nexa run faster-whisper-large

CLI Reference

Here's a brief overview of the main CLI commands:

For detailed information on CLI commands and usage, please refer to the CLI Reference document.

Start Local Server

To start a local server using models on your local computer, you can use the nexa server command. For detailed information on server setup, API endpoints, and usage examples, please refer to the Server Reference document.

Acknowledgements

We would like to thank the following projects: