# LitServe **High-throughput serving engine for AI models, with a friendly interface and enterprise scale.**

✅ Batching       ✅ Streaming          ✅ Auto-GPU, multi-GPU   
✅ Multi-modal    ✅ PyTorch/JAX/TF     ✅ Full control          
✅ Auth           ✅ Built on Fast API  ✅ Custom specs (Open AI)

--- ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/litserve) ![cpu-tests](https://github.com/Lightning-AI/litserve/actions/workflows/ci-testing.yml/badge.svg) [![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/litserve/blob/main/LICENSE) [![Discord](https://img.shields.io/discord/1077906959069626439?style=plastic)](https://discord.gg/VptPCZkGNa)

Lightning AI • Get started • Examples • Deploy • Features • Docs

Deploy AI models Lightning fast ⚡

LitServe is a high-throughput serving engine designed to deploy AI models at scale. It creates an API endpoint for models, managing batching, streaming, and autoscaling across CPUs and GPUs and more.

✅ Supports all models: LLMs, vision, time-series, etc...
✅ Developer friendly: Focus on AI deployment not infrastructure.
✅ Minimal interface: Zero-abstraction, hackable code-base.
✅ Enterprise scale: Designed to handle large models with low latency.
✅ Auto GPU scaling: Scale to multi-GPU with zero code changes.

Think of LitServe as PyTorch Lightning for model serving (if you're familiar with Lightning) but supports every framework like PyTorch, JAX, Tensorflow and more.

Quick start

Install LitServe via pip (or advanced installs):

pip install litserve

Define a server

Here's a hello world example (explore real examples):

# server.py
import litserve as ls

# STEP 1: DEFINE YOUR MODEL API
class SimpleLitAPI(ls.LitAPI):
    def setup(self, device):
        # Setup the model so it can be called in `predict`.
        self.model = lambda x: x**2

    def decode_request(self, request):
        # Convert the request payload to your model input.
        return request["input"]

    def predict(self, x):
        # Run the model on the input and return the output.
        return self.model(x)

    def encode_response(self, output):
        # Convert the model output to a response payload.
        return {"output": output}

# STEP 2: START THE SERVER
if __name__ == "__main__":
    api = SimpleLitAPI()
    server = ls.LitServer(api, accelerator="auto")
    server.run(port=8000)

Now run the server via the command-line

python server.py

These 2 minimal APIs allow enterprise-scale, with full control.

⚡️ LitAPI: Describes how the server will handle a request.
⚡️ LitServer: Specify optimizations (such as batching, streaming, GPUs).

Query the server

LitServe automatically generates a client when it starts. Use this client to test the server:

python client.py

Or query the server yourself directly

import requests
response = requests.post("http://127.0.0.1:8000/predict", json={"input": 4.0})

Examples

Explore various examples that show different models deployed with LitServe:

Example	description	Deploy on Studios
Hello world	Hello world model
Llama 3 (8B)	(LLM) Deploy Llama 3
LLM proxy server	(LLM) Routes traffic to various LLM providers for fault tolerance
ANY Hugging face model	(Text) Deploy any Hugging Face model
Hugging face BERT model	(Text) Deploy model for tasks like text generation and more
Open AI CLIP	(Multimodal) Deploy Open AI CLIP for tasks like image understanding
Open AI Whisper	(Audio) Deploy Open AI Whisper for tasks like speech to text
Meta AudioCraft	(Audio) Deploy Meta's AudioCraft for music generation
Stable Audio	(Audio) Deploy Stable Audio for audio generation
Stable diffusion 2	(Vision) Deploy Stable diffusion 2 for tasks like image generation
Text-speech (XTTS V2)	(Speech) Deploy a text to speech voice cloning API.

Deployment options

LitServe is developed by Lightning AI - An AI development platform which provides infrastructure for deploying AI models.
Self manage your own deployments or use Lightning Studios to deploy production-grade models without cloud headaches.

Features

LitServe supports multiple advanced state-of-the-art features.

✅ All model types: LLMs, vision, time series, etc....
✅ Auto-GPU scaling.
✅ Authentication.
✅ Autoscaling.
✅ Batching.
✅ Streaming.
✅ All ML frameworks: PyTorch, Jax, Tensorflow, Hugging Face....
✅ Open AI spec.
10+ features....

[!NOTE] Our goal is not to jump on every hype train, but instead support features that scale under the most demanding enterprise deployments.

Contribute

LitServe is a community project accepting contributions.
Let's make the world's most advanced AI inference engine.

License

litserve is released under the Apache 2.0 license. See LICENSE file for details.

Lightning-AI / LitServe

readme