GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
196 stars 93 forks source link

feat: add samples sent error counter #1126

Closed pintohutch closed 3 months ago

pintohutch commented 3 months ago

We previously only logged an error when this happened, which can be hard to debug. Particularly in cases where we are writing to more than one project.

This change introduces a counter gcm_export_samples_sent_errors_total that is incremented every time a send call to GCM returns an error. We add the project_id as a label to help better inform the destination project that is experiencing the issue.

On a given collector or rule-evaluator, the cardinality of project_id values should be relatively small, O(100) in the more extreme cases.

pintohutch commented 3 months ago

cc @lyanco - this may be a nice self-observability case for customers having issues.