GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
191 stars 89 forks source link

feat: add samples sent error counter #1126

Closed pintohutch closed 3 weeks ago

pintohutch commented 3 weeks ago

We previously only logged an error when this happened, which can be hard to debug. Particularly in cases where we are writing to more than one project.

This change introduces a counter gcm_export_samples_sent_errors_total that is incremented every time a send call to GCM returns an error. We add the project_id as a label to help better inform the destination project that is experiencing the issue.

On a given collector or rule-evaluator, the cardinality of project_id values should be relatively small, O(100) in the more extreme cases.

pintohutch commented 3 weeks ago

cc @lyanco - this may be a nice self-observability case for customers having issues.