cloudposse / prometheus-to-cloudwatch

Utility for scraping Prometheus metrics from a Prometheus client endpoint and publishing them to CloudWatch
https://cloudposse.com/accelerate
Apache License 2.0
169 stars 37 forks source link

Long name metrics cannot be submitted to Cloudwatch #14

Open hlascelles opened 5 years ago

hlascelles commented 5 years ago

Thank you for this chart... We have it working in the most part, but many metrics are missing, and they map to the logs we are seeing from the exporter pod:

2019/06/21 11:45:10 prometheus-to-cloudwatch: error publishing to CloudWatch: InvalidParameterValue: The value for parameter MetricData.member.10.Dimensions.member.3.Value contains non-ASCII characters.
The parameter MetricData.member.10.Dimensions.member.3.Value must be shorter than 257 characters.
    status code: 400, request id: 06754578-941a-11e9-a7de-0dbbdca37bd2
2019/06/21 11:45:11 prometheus-to-cloudwatch: error publishing to CloudWatch: InvalidParameterValue: The value for parameter MetricData.member.1.Dimensions.member.4.Value contains non-ASCII characters.
The parameter MetricData.member.1.Dimensions.member.4.Value must be shorter than 257 characters.
    status code: 400, request id: 0677688e-941a-11e9-8b66-85751c18c2cd
2019/06/21 11:45:15 prometheus-to-cloudwatch: published 135 metrics to CloudWatch

To be clear, we are seeing most metrics, but are just missing the ones that have long names. I suspect that when the name is chopped, that is when the non-ASCII problems happen. We are not using non-ASCII chars so I think that problem is a red herring.

~We have retrieved the metrics form the server:~

~Side note, the one that is working (foofoob) is not scheduled, ie not running (due to node exhaustion during scale up. You can see this as the attribute "node" is just "", so it is not running anywhere). This, (funnily enough) means the metric line is short enough to be submitted to Cloudwatch. Perversely, this means we are getting metrics for pods that aren't yet running anywhere, but as soon as they land on a node, their metric line gets too long and they stop reporting.~

EDIT: Apologies, the missing metrics in the original PR are actually present, it was a Cloudwatch graphing error!

The problem of (other) statistics failing to be sent is still occurring however (see the error log lines above). We are now not even sure which ones are being missed. What is the best solution here? Can the metric lines be made shorter somehow, or can we log which ones are not being sent?

hlascelles commented 5 years ago

This is using both the chart is-as at 74739a45455341900c0de095303e4daab451b1fd (so version 0.2.0), and also updating the image to 0.6.0.

edfungus commented 5 years ago

Any updates on this issue? This issue makes it hard to use this project.

osterman commented 5 years ago

@aknysh is currently on vacation. If @hlascelles has a fix and the time to contribute the PR, we'll promptly review as soon as Andriy is back.

hlascelles commented 5 years ago

I'm afraid I'm new to Go, so couldn't produce anything quickly.