PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.15k stars 126 forks source link

[Bug]: Metrics incorrect when having zero throughput #782

Open mrseeker opened 1 month ago

mrseeker commented 1 month ago

Your current environment

Docker container v0.6.2 (36d2ba5ad90b)

🐛 Describe the bug

The /metrics endpoint is not showing correct usage when no messages are being sent through the system. Our Prometheus endpoint grabs this endpoint every 20 seconds, and when there is no load on the server, the first_token_output will remain at its last known variable. I expect it to reset after the metrics have been grabbed; however, it shows me an average for the pod's lifespan. Other variables (such as request latency) also suffer from this issue.

When the /metrics endpoint is grabbed, it restarts the count and does not average over the whole time the metrics have been running. This would make it much easier to spot when servers are overloaded and when HPA needs to start.