bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.13k stars 791 forks source link

feat: adding custom metrics from API service #2792

Closed ssheng closed 2 years ago

ssheng commented 2 years ago

There is an increasing demand from the community for adding custom metrics to the API service. BentoML supports basic service level metrics out-of-box, including request duration, in-progress, and count, using a prometheus_client. However, it is not as straight forward for users to create new metrics. An ideal API allows users to define metrics during initialization and update metrics upon a certain event (e.g. request, response, error).

aarnphm commented 2 years ago

Context: we are currently handling prometheus metrics via PrometheusClient, which is accessed via BentoMLContainer.metrics_client as a singleton factory. This is currently an internal container, and we intend to keep this only for internal use.

bentoml.metrics API

Hence, an external API should expose these metrics client to the user. I propose we should make it bentoml.metrics as our user-facing API

this API is essentially our metrics_client, which should be exposed to all of the prometheus_client API. For example:

import bentoml

my_histogram = bentoml.metrics.Histogram("...", ...)

my_counter = bentoml.metrics.Counter()

This means that when the user uses bentoml.metrics, it should be as seamlessly as if the import prometheus_client, hence the following scenario should also be doable:

from bentoml.metrics import Histogram
from bentoml.metrics.parser import text_to_string_family_metrics

Register custom metrics to given Service

Currently, if users are importing prometheus_client directly in their service, it would fail due since BentoML will have to handle multiprocessing mode to export metrics. This is done by delaying initialization steps for metrics as late as possible.

I'm proposing that the API for registering custom metrics should be synonymous to the ones when we register runners.


import bentoml

runners = bentoml.pytorch.get("my_torch_model:latest").to_runner()

my_histogram = bentoml.metrics.Histogram(name="asdf", bucket=my_bucket)

svc = bentoml.Service("service", runners=[runners], metrics=[my_histogram])

@svc.api(input=...,output=...)
def predict(input):
    my_histogram.labels(...).observe(...)
    # my_logic_here

This means that metrics should also be initialized as late as possible.