As a backends operator
I want to have Prometheus metrics for observability of the vLLM backend
So that I can monitor the performance, health, and usage of the vLLM backend
Acceptance Criteria:
[ ] Identify the key metrics to be collected for observability of the vLLM backend
[ ] Implement Prometheus metric endpoints in the vLLM backend to expose the following metrics:
[ ] Generation tokens per second
[ ] Total number of generated tokens
[ ] Latency of generation requests
[ ] Number of active generation requests
[ ] CPU and memory utilization of the vLLM backend
[ ] Error rates and counts
[ ] Ensure the Prometheus metrics are properly formatted and follow the Prometheus conventions
[ ] Configure the Prometheus server to scrape the vLLM backend's metric endpoints at a specified interval
[ ] Set up appropriate Prometheus recording and alerting rules based on the collected metrics
[ ] Create Grafana dashboards to visualize the vLLM backend metrics in real-time
[ ] Implement secure authentication and authorization mechanisms for accessing the Prometheus metrics endpoints
[ ] Optimize the performance impact of collecting and exposing metrics to minimize overhead on the vLLM backend
[ ] Document the available metrics, their descriptions, and how to interpret them
[ ] Provide instructions for setting up and configuring Prometheus and Grafana for vLLM backend monitoring
[ ] Conduct load testing to ensure the metrics collection and exposition can handle high traffic scenarios
[ ] Integrate the Prometheus metrics with existing monitoring and alerting systems used by the organization
[ ] Regularly review and update the metrics based on operational insights and feedback from the team
Bump this issue with the following additions, on top of vLLM. Each one must have all of the criteria as listed for vLLM in the original issue description:
User Story: Implement Backend Prometheus Metrics
As a backends operator I want to have Prometheus metrics for observability of the vLLM backend So that I can monitor the performance, health, and usage of the vLLM backend
Acceptance Criteria: