[metricbeat/openmetrics, metricbeat/prometheus] histgram values of zero are filtered out on non-amd64 platforms

jonathan-albrecht-ibm commented 1 year ago

Version: 8.5.2
Operating System: Linux
Discuss Forum URL: https://discuss.elastic.co/t/kubernetes-unit-tests-failing-on-arm64-and-other-non-amd64-platforms/320909 (no response from community after a month)
Steps to Reproduce: Run the metricbeats tests for modules based on the openmetrics and prometheus parsers, eg. kubernetes, on arm64 or s390x linux. Tests such as the following fail:
github.com/elastic/beats/v7/metricbeat/module/etcd/metrics
github.com/elastic/beats/v7/metricbeat/module/kubernetes/proxy
github.com/elastic/beats/v7/metricbeat/module/kubernetes/scheduler
github.com/elastic/beats/v7/metricbeat/mb/testing/data
github.com/elastic/beats/v7/metricbeat/module/kubernetes/apiserver

because the histograms are missing buckets that have the expected value 0.

For openmetrics and prometheus histograms, some values like Bucket.CumulativeCount are stored as int64 after being parsed as float64 from the openmetrics text parser or prometheus expfmt parser. Later, these values are filtered out if they are NaN or Inf with code like:

if bucket.GetCumulativeCount() != uint64(math.NaN()) && bucket.GetCumulativeCount() != uint64(math.Inf(0)) { ...save value...}

but casting math.NaN() and math.Inf(0) to int64 are platform dependent in go. I don't think these casts really make sense since NaN and Inf are floating point special values and integers don't have the same concepts. On amd64:

uint64(math.NaN()) == uint64(math.Inf(0)) == uint64(math.Inf(-1)) == 9223372036854775808

but on arm64 and s390x:

uint64(math.NaN()) == uint64(math.Inf(-1)) == 0,
uint64(math.Inf(0)) == 18446744073709551615

so the int64 check to filter out NaN and Inf is true whenever bucket.GetCumulativeCount() == 0 on non-amd64 platforms. Even on amd64 it will filter out bucket.GetCumulativeCount() == 9223372036854775808 but that's not a common value.

For openmetrics, since the structs in metricbeat/helper/openmetrics/openmetrics.go are really just data transfer objects, if all the fields were kept as float64 (since the text parser parses numbers to float64) then the checks for NaN and Inf could be done correctly later on.

For prometheus, the expfmt parser casts these values to int64 internally so NaN and -Inf values are converted to zero on non-amd64 platforms. I think the only fix would be to not do the check for NaN on int64 values. Maybe this could be done for non-amd64 platforms if you want to keep the current amd64 behaviour.

I hope the above is helpful. Please let me know if I can provide any more info.

rposts commented 1 year ago

Hi there - this still exists in 8.6.2 release - any updates on this issue?

botelastic[bot] commented 3 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

jonathan-albrecht-ibm commented 3 months ago

👍 yes, still relevant

elastic / beats

[metricbeat/openmetrics, metricbeat/prometheus] histgram values of zero are filtered out on non-amd64 platforms #34235