Open miguelalba96 opened 3 days ago
hi @miguelalba96, to use Prometheus with LitServe you will need to create a multiprocessing registry so that it can collect metrics from all the inference processes. I have created the following example based on your code:
import os, time
from prometheus_client import CollectorRegistry, Histogram, make_asgi_app, multiprocess
import litserve as ls
# Set the directory for multiprocess mode
os.environ["PROMETHEUS_MULTIPROC_DIR"] = "/tmp/prometheus_multiproc_dir"
# Ensure the directory exists
if not os.path.exists("/tmp/prometheus_multiproc_dir"):
os.makedirs("/tmp/prometheus_multiproc_dir")
# Use a multiprocess registry
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
class PrometheusLogger(ls.Logger):
def __init__(self):
super().__init__()
self.function_duration = Histogram("request_processing_seconds", "Time spent processing request", ["function_name"], registry=registry)
def process(self, key, value):
print("processing", key, value)
self.function_duration.labels(function_name=key).observe(value)
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
self.model1 = lambda x: x**2
self.model2 = lambda x: x**3
def decode_request(self, request):
return request["input"]
def predict(self, x):
start_time = time.perf_counter()
squared = self.model1(x)
cubed = self.model2(x)
output = squared + cubed
self.log("get_image_embedding", time.perf_counter() - start_time)
return {"output": output}
def encode_response(self, output):
return {"output": output}
if __name__ == "__main__":
prometheus_logger = PrometheusLogger()
prometheus_logger.mount(path="/metrics", app=make_asgi_app(registry=registry))
api = SimpleLitAPI()
server = ls.LitServer(api, loggers=prometheus_logger)
server.run(port=8000)
After using this code you should see the /metrics
endpoint value as follows:
# HELP request_processing_seconds Multiprocess metric
# TYPE request_processing_seconds histogram
request_processing_seconds_sum{function_name="get_image_embedding"} 4.124827682971954e-06
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.005"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.01"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.025"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.05"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.075"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.1"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.25"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.5"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="0.75"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="1.0"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="2.5"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="5.0"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="7.5"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="10.0"} 2.0
request_processing_seconds_bucket{function_name="get_image_embedding",le="+Inf"} 2.0
request_processing_seconds_count{function_name="get_image_embedding"} 2.0
# HELP http_server_requests_duration_seconds_total HTTP request latency in seconds
# TYPE http_server_requests_duration_seconds_total histogram
# HELP request_processing_seconds Time spent processing request
# TYPE request_processing_seconds histogram
Also, please free to safely ignore the Picklable warning since we reconstruct the object which are not pickleable. It is just a warning in case something goes wrong when we reconstruct.
🐛 Bug
I get the following warning when using Prometheus inside of
ls.Logger
:Then the metrics that I am "observing" are not being tracked under the endpoint
/metrics
usingself.log
on the ls.LitAPI. These are the metrics I get:This is my implementation for the Logger:
To Reproduce
I use the following running configuration:
where my
monitoring.HTTPLatencyMiddleware
is defined like this:am I running properly correctly the
ls.Server
Logger and mounts? or there is something wrong?. I am following the docstrings fromls.Logger
I use
prometheus-client==0.21.0"
andlitserve==0.2.3