Lightning-AI / LitServe

High-throughput serving engine for AI models, with a friendly interface and enterprise scale.
https://lightning.ai/docs/litserve
Apache License 2.0
129 stars 13 forks source link
ai api serving
LitGPT # LitServe **High-throughput serving engine for AI models, with a friendly interface and enterprise scale.**
✅ Batching       ✅ Streaming          ✅ Auto-GPU, multi-GPU   
✅ Multi-modal    ✅ PyTorch/JAX/TF     ✅ Full control          
✅ Auth           ✅ Built on Fast API  ✅ Custom specs (Open AI)
--- ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/litserve) ![cpu-tests](https://github.com/Lightning-AI/litserve/actions/workflows/ci-testing.yml/badge.svg) [![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/litserve/blob/main/LICENSE) [![Discord](https://img.shields.io/discord/1077906959069626439?style=plastic)](https://discord.gg/VptPCZkGNa)

Lightning AIGet startedExamplesDeployFeaturesDocs

  Get started

 

Deploy AI models Lightning fast ⚡

LitServe is a high-throughput serving engine designed to deploy AI models at scale. It creates an API endpoint for models, managing batching, streaming, and autoscaling across CPUs and GPUs and more.

Supports all models: LLMs, vision, time-series, etc...
Developer friendly: Focus on AI deployment not infrastructure.
Minimal interface: Zero-abstraction, hackable code-base.
Enterprise scale: Designed to handle large models with low latency.
Auto GPU scaling: Scale to multi-GPU with zero code changes.

Think of LitServe as PyTorch Lightning for model serving (if you're familiar with Lightning) but supports every framework like PyTorch, JAX, Tensorflow and more.

 

 

Quick start

 

Open In Studio

 

Install LitServe via pip (or advanced installs):

pip install litserve

Define a server

Here's a hello world example (explore real examples):

# server.py
import litserve as ls

# STEP 1: DEFINE YOUR MODEL API
class SimpleLitAPI(ls.LitAPI):
    def setup(self, device):
        # Setup the model so it can be called in `predict`.
        self.model = lambda x: x**2

    def decode_request(self, request):
        # Convert the request payload to your model input.
        return request["input"]

    def predict(self, x):
        # Run the model on the input and return the output.
        return self.model(x)

    def encode_response(self, output):
        # Convert the model output to a response payload.
        return {"output": output}

# STEP 2: START THE SERVER
if __name__ == "__main__":
    api = SimpleLitAPI()
    server = ls.LitServer(api, accelerator="auto")
    server.run(port=8000)

Now run the server via the command-line

python server.py

These 2 minimal APIs allow enterprise-scale, with full control.

⚡️ LitAPI: Describes how the server will handle a request.
⚡️ LitServer: Specify optimizations (such as batching, streaming, GPUs).

Query the server

LitServe automatically generates a client when it starts. Use this client to test the server:

python client.py

Or query the server yourself directly

import requests
response = requests.post("http://127.0.0.1:8000/predict", json={"input": 4.0})

 

Examples

Explore various examples that show different models deployed with LitServe:

Example description Deploy on Studios
Hello world Hello world model Open In Studio
Llama 3 (8B) (LLM) Deploy Llama 3 Open In Studio
LLM proxy server (LLM) Routes traffic to various LLM providers for fault tolerance Open In Studio
ANY Hugging face model (Text) Deploy any Hugging Face model Open In Studio
Hugging face BERT model (Text) Deploy model for tasks like text generation and more Open In Studio
Open AI CLIP (Multimodal) Deploy Open AI CLIP for tasks like image understanding Open In Studio
Open AI Whisper (Audio) Deploy Open AI Whisper for tasks like speech to text Open In Studio
Meta AudioCraft (Audio) Deploy Meta's AudioCraft for music generation Open In Studio
Stable Audio (Audio) Deploy Stable Audio for audio generation Open In Studio
Stable diffusion 2 (Vision) Deploy Stable diffusion 2 for tasks like image generation Open In Studio
Text-speech (XTTS V2) (Speech) Deploy a text to speech voice cloning API. Open In Studio

 

Deployment options

LitServe is developed by Lightning AI - An AI development platform which provides infrastructure for deploying AI models.
Self manage your own deployments or use Lightning Studios to deploy production-grade models without cloud headaches.

 

Deploy on Studios

 

image

Features

LitServe supports multiple advanced state-of-the-art features.

All model types: LLMs, vision, time series, etc....
Auto-GPU scaling.
Authentication.
Autoscaling.
Batching.
Streaming.
All ML frameworks: PyTorch, Jax, Tensorflow, Hugging Face....
Open AI spec.
10+ features....

 

[!NOTE] Our goal is not to jump on every hype train, but instead support features that scale under the most demanding enterprise deployments.

Contribute

LitServe is a community project accepting contributions.
Let's make the world's most advanced AI inference engine.

License

litserve is released under the Apache 2.0 license. See LICENSE file for details.