SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
688 stars 177 forks source link

Adding more metrics to MLServer Prometheus endpoint #805

Open saeid93 opened 1 year ago

saeid93 commented 1 year ago

The currently implemented metrics in MLServer are all around pure count of the number of requests:

mlserver-metrics

Compared with similar platforms like Triton Server many other metrics could be added some of them are of high importance for performance monitoring. E.g. metrics for latency monitoring of the model, queuing time of the batch, and GPU usage are some of the metrics that are specific to the model server and are impossible to calculate using other scrappers like CAdvisor. The hack I'm currently using for extracting such metrics inside my containers is to measure and report them as part of some logs in my request response which is not ideal.

yaliqin commented 9 months ago

vote for this, especially latency histogram metrics