feat(backends): implement observability hooks for backends

User Story: Implement Backend Prometheus Metrics

As a backends operator I want to have Prometheus metrics for observability of the vLLM backend So that I can monitor the performance, health, and usage of the vLLM backend

Acceptance Criteria:

[ ] Identify the key metrics to be collected for observability of the vLLM backend
[ ] Implement Prometheus metric endpoints in the vLLM backend to expose the following metrics:
- [ ] Generation tokens per second
- [ ] Total number of generated tokens
- [ ] Latency of generation requests
- [ ] Number of active generation requests
- [ ] CPU and memory utilization of the vLLM backend
- [ ] Error rates and counts
[ ] Ensure the Prometheus metrics are properly formatted and follow the Prometheus conventions
[ ] Configure the Prometheus server to scrape the vLLM backend's metric endpoints at a specified interval
[ ] Set up appropriate Prometheus recording and alerting rules based on the collected metrics
[ ] Create Grafana dashboards to visualize the vLLM backend metrics in real-time
[ ] Implement secure authentication and authorization mechanisms for accessing the Prometheus metrics endpoints
[ ] Optimize the performance impact of collecting and exposing metrics to minimize overhead on the vLLM backend
[ ] Document the available metrics, their descriptions, and how to interpret them
[ ] Provide instructions for setting up and configuring Prometheus and Grafana for vLLM backend monitoring
[ ] Conduct load testing to ensure the metrics collection and exposition can handle high traffic scenarios
[ ] Integrate the Prometheus metrics with existing monitoring and alerting systems used by the organization
[ ] Regularly review and update the metrics based on operational insights and feedback from the team

defenseunicorns / leapfrogai

feat(backends): implement observability hooks for backends #297

User Story: Implement Backend Prometheus Metrics

Acceptance Criteria: