✅ Batching ✅ Streaming ✅ Auto-GPU, multi-GPU ✅ Multi-modal ✅ PyTorch/JAX/TF ✅ Full control ✅ Auth ✅ Built on Fast API ✅ Custom specs (Open AI)--- ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/litserve) ![cpu-tests](https://github.com/Lightning-AI/litserve/actions/workflows/ci-testing.yml/badge.svg) [![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/litserve/blob/main/LICENSE) [![Discord](https://img.shields.io/discord/1077906959069626439?style=plastic)](https://discord.gg/VptPCZkGNa)
Lightning AI • Get started • Examples • Deploy • Features • Docs
LitServe is a high-throughput serving engine designed to deploy AI models at scale. It creates an API endpoint for models, managing batching, streaming, and autoscaling across CPUs and GPUs and more.
✅ Supports all models: LLMs, vision, time-series, etc...
✅ Developer friendly: Focus on AI deployment not infrastructure.
✅ Minimal interface: Zero-abstraction, hackable code-base.
✅ Enterprise scale: Designed to handle large models with low latency.
✅ Auto GPU scaling: Scale to multi-GPU with zero code changes.
Think of LitServe as PyTorch Lightning for model serving (if you're familiar with Lightning) but supports every framework like PyTorch, JAX, Tensorflow and more.
Install LitServe via pip (or advanced installs):
pip install litserve
Here's a hello world example (explore real examples):
# server.py
import litserve as ls
# STEP 1: DEFINE YOUR MODEL API
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
# Setup the model so it can be called in `predict`.
self.model = lambda x: x**2
def decode_request(self, request):
# Convert the request payload to your model input.
return request["input"]
def predict(self, x):
# Run the model on the input and return the output.
return self.model(x)
def encode_response(self, output):
# Convert the model output to a response payload.
return {"output": output}
# STEP 2: START THE SERVER
if __name__ == "__main__":
api = SimpleLitAPI()
server = ls.LitServer(api, accelerator="auto")
server.run(port=8000)
Now run the server via the command-line
python server.py
These 2 minimal APIs allow enterprise-scale, with full control.
⚡️ LitAPI: Describes how the server will handle a request.
⚡️ LitServer: Specify optimizations (such as batching, streaming, GPUs).
LitServe automatically generates a client when it starts. Use this client to test the server:
python client.py
Or query the server yourself directly
import requests
response = requests.post("http://127.0.0.1:8000/predict", json={"input": 4.0})
Explore various examples that show different models deployed with LitServe:
Example | description | Deploy on Studios |
---|---|---|
Hello world | Hello world model | |
Llama 3 (8B) | (LLM) Deploy Llama 3 | |
LLM proxy server | (LLM) Routes traffic to various LLM providers for fault tolerance | |
ANY Hugging face model | (Text) Deploy any Hugging Face model | |
Hugging face BERT model | (Text) Deploy model for tasks like text generation and more | |
Open AI CLIP | (Multimodal) Deploy Open AI CLIP for tasks like image understanding | |
Open AI Whisper | (Audio) Deploy Open AI Whisper for tasks like speech to text | |
Meta AudioCraft | (Audio) Deploy Meta's AudioCraft for music generation | |
Stable Audio | (Audio) Deploy Stable Audio for audio generation | |
Stable diffusion 2 | (Vision) Deploy Stable diffusion 2 for tasks like image generation | |
Text-speech (XTTS V2) | (Speech) Deploy a text to speech voice cloning API. |
LitServe is developed by Lightning AI - An AI development platform which provides infrastructure for deploying AI models.
Self manage your own deployments or use Lightning Studios to deploy production-grade models without cloud headaches.
LitServe supports multiple advanced state-of-the-art features.
✅ All model types: LLMs, vision, time series, etc....
✅ Auto-GPU scaling.
✅ Authentication.
✅ Autoscaling.
✅ Batching.
✅ Streaming.
✅ All ML frameworks: PyTorch, Jax, Tensorflow, Hugging Face....
✅ Open AI spec.
10+ features....
[!NOTE] Our goal is not to jump on every hype train, but instead support features that scale under the most demanding enterprise deployments.
LitServe is a community project accepting contributions.
Let's make the world's most advanced AI inference engine.
litserve is released under the Apache 2.0 license. See LICENSE file for details.