🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our Slack community!
BentoML is an open-source model serving framework, simplifying how AI/ML models gets into production:
Install BentoML:
# Requires Python≥3.8
pip install -U bentoml
pip install torch transformers # additional dependencies for demo purpose
Define APIs in a service.py
file.
from __future__ import annotations
import bentoml
from typing import List
@bentoml.service
class Summarization:
def __init__(self) -> None:
from transformers import pipeline
self.pipeline = pipeline('summarization')
@bentoml.api(batchable=True)
def summarize(self, texts: List[str]) -> List[str]:
results = self.pipeline(texts)
return [item['summary_text'] for item in results]
Run the service code locally (serving at http://localhost:3000 by default):
bentoml serve service.py:Summarization
Now you can run inference from your browser at http://localhost:3000 or with a Python script:
import bentoml
with bentoml.SyncHTTPClient('http://localhost:3000') as client:
text_to_summarize: str = input("Enter text to summarize: ")
summarized_text: str = client.summarize([text_to_summarize])[0]
print(f"Summarized text: {summarized_text}")
To deploy your BentoML Service code, first create a bentofile.yaml
file to define its dependencies and environments. Find the full list of bentofile options here.
service: "service:Summarization" # Entry service import path
include:
- "*.py" # Include all .py files in current directory
python:
lock_packages: false # option to lock versions found in current environment
packages: # Python dependencies to include
- torch
- transformers
Then, choose one of the following ways for deployment:
For detailed explanations, read Quickstart.
Check out the examples folder for more sample code and usage.
See Documentation for more tutorials and guides.
Get involved and join our Community Slack 💬, where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.
To report a bug or suggest a feature request, use GitHub Issues.
There are many ways to contribute to the project:
#bentoml-contributors
channel here.Thanks to all of our amazing contributors!
The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the code used for usage tracking. You can opt-out of usage tracking by the --do-not-track
CLI option:
bentoml [command] --do-not-track
Or by setting the environment variable:
export BENTOML_DO_NOT_TRACK=True