Closed tetianakravchenko closed 1 year ago
The second error ([values] values must be in increasing order, got [-4.9E-324] but previous value was [0.0]"
) is related to this issue - https://github.com/elastic/beats/issues/36317, and is going to be fixed soon.
'kubernetes-apiservers' and job_name: 'kubernetes-cadvisor' are the two scraping targets that generate the histograms in my setup.
I was able to reproduce the issue on my setup as well for multiple apiserver_flowcontrol_*
histograms, it is actually just 3 metrics: apiserver_flowcontrol_priority_level_request_utilization
,
apiserver_flowcontrol_demand_seats
,
apiserver_flowcontrol_read_vs_write_current_requests
After some time, I see the histogram metric - prometheus.apiserver_flowcontrol_priority_level_request_utilization.histogram
, but it is empty - {"values":[],"counts":[]}
, not sure if it is a correct value:
opened elasticsearch issue - https://github.com/elastic/elasticsearch/issues/99820 one thing I can think of for now - add check on the beats side, so not whole document with all other metrics will be dropped
regarding the error: reason":"[1:2805] failed to parse field [prometheus.go_gc_pauses_seconds_total.histogram] of type [histogram]","caused_by":{"type":"document_parsing_exception","reason":"[1:2805] error parsing field [prometheus.go_gc_pauses_seconds_total.histogram], [values] values must be in increasing order, got [-4.9E-324] but previous value was [0.0]"}}, dropping event!
all similar error seems to be coming from the kubernetes-nodes
job.
The actual metric looks like:
curl -s localhost:10249/metrics | grep go_gc_pauses_seconds_total
# HELP go_gc_pauses_seconds_total Distribution individual GC-related stop-the-world pause latencies.
# TYPE go_gc_pauses_seconds_total histogram
go_gc_pauses_seconds_total_bucket{le="-5e-324"} 0
go_gc_pauses_seconds_total_bucket{le="9.999999999999999e-10"} 0
go_gc_pauses_seconds_total_bucket{le="9.999999999999999e-09"} 0
go_gc_pauses_seconds_total_bucket{le="9.999999999999998e-08"} 0
go_gc_pauses_seconds_total_bucket{le="1.0239999999999999e-06"} 0
go_gc_pauses_seconds_total_bucket{le="1.0239999999999999e-05"} 24575
go_gc_pauses_seconds_total_bucket{le="0.00010239999999999998"} 25754
go_gc_pauses_seconds_total_bucket{le="0.0010485759999999998"} 51322
go_gc_pauses_seconds_total_bucket{le="0.010485759999999998"} 51579
go_gc_pauses_seconds_total_bucket{le="0.10485759999999998"} 51628
go_gc_pauses_seconds_total_bucket{le="+Inf"} 51628
go_gc_pauses_seconds_total_sum NaN
go_gc_pauses_seconds_total_count 51628
the first bucket actually is a negative value - le="-5e-324"
the same behavior for some other metrics - go_sched_latencies_seconds
this will be fixed by https://github.com/elastic/beats/pull/36647
first error - Numeric value (5000945144) out of range of int (-2147483648 - 2147483647)
should be fixed in https://github.com/elastic/elasticsearch/issues/99820
second error - [values] values must be in increasing order, got [-4.9E-324] but previous value was [0.0]"
should be closed in https://github.com/elastic/beats/pull/36647
both PRs were merged and will be available in 8.11.0
Some documents are dropped due to:
This could be related to the fact that the datastream was actually dropped first to empty the index