Closed ssheng closed 2 years ago
Context: we are currently handling prometheus metrics via PrometheusClient
, which is accessed via BentoMLContainer.metrics_client
as a singleton factory.
This is currently an internal container, and we intend to keep this only for internal use.
bentoml.metrics
APIHence, an external API should expose these metrics client to the user. I propose we should make it bentoml.metrics
as our user-facing API
this API is essentially our metrics_client
, which should be exposed to all of the prometheus_client
API. For example:
import bentoml
my_histogram = bentoml.metrics.Histogram("...", ...)
my_counter = bentoml.metrics.Counter()
This means that when the user uses bentoml.metrics
, it should be as seamlessly as if the import prometheus_client
, hence the following scenario should also be doable:
from bentoml.metrics import Histogram
from bentoml.metrics.parser import text_to_string_family_metrics
Currently, if users are importing prometheus_client directly in their service, it would fail due since BentoML will have to handle multiprocessing mode to export metrics. This is done by delaying initialization steps for metrics as late as possible.
I'm proposing that the API for registering custom metrics should be synonymous to the ones when we register runners.
import bentoml
runners = bentoml.pytorch.get("my_torch_model:latest").to_runner()
my_histogram = bentoml.metrics.Histogram(name="asdf", bucket=my_bucket)
svc = bentoml.Service("service", runners=[runners], metrics=[my_histogram])
@svc.api(input=...,output=...)
def predict(input):
my_histogram.labels(...).observe(...)
# my_logic_here
This means that metrics should also be initialized as late as possible.
There is an increasing demand from the community for adding custom metrics to the API service. BentoML supports basic service level metrics out-of-box, including request duration, in-progress, and count, using a
prometheus_client
. However, it is not as straight forward for users to create new metrics. An ideal API allows users to define metrics during initialization and update metrics upon a certain event (e.g. request, response, error).