aws / amazon-managed-service-for-prometheus-roadmap

Amazon Managed Service for Prometheus Public Roadmap
Other
39 stars 3 forks source link

Additional metrics and logging for query performance visibility #21

Open robert-becker-hs opened 1 year ago

robert-becker-hs commented 1 year ago

The current metrics and logs provide limited visibility into how queries perform in AMP. The users don't have visibility into the query duration, queries per second, failed queries, or how many QSPs were used for querying. The latter would especially provide insights into potential cost optimizations (since QSPs are used for calculating the cost of querying AMP).

Additional logs could help to debug failing queries or which queries need optimization. Especially for companies where a multitude of teams write their own queries (e.g Grafana), having this sort of logging output would be a great help in having better visibility into how the system is used. Having a config for a "query time threshold" could be advantageous which would only log queries if their execution time passes a certain (configurable) threshold.