akpw / mktxp

Prometheus Exporter for Mikrotik RouterOS devices
Other
408 stars 83 forks source link

[PoE] Duplicate timeseries with same timestamp #159

Closed mwetterw closed 2 days ago

mwetterw commented 6 days ago

Since I upgraded mktxp to the latest stable version (stable-20240612143640), I see prometheus complaining at almost every scrape.

ts=2024-06-24T15:02:29.645Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=mktxp target=http://mktxp:49090/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=3

A quick investigation reveals the problem:

root@test:~# wget -O- 'http://172.17.0.1:49090/metrics' 2>/dev/null | sort | uniq -c | sort -r | head -n 10
      2 mktxp_poe_out_voltage{routerboard_address="10.0.0.1",routerboard_name="Mikrotik"} 51.9
      2 # TYPE mktxp_poe_out_voltage gauge
      2 # TYPE mktxp_poe_out_power gauge
      2 # TYPE mktxp_poe_out_current gauge
      2 # HELP mktxp_poe_out_voltage POE Out Voltage
      2 # HELP mktxp_poe_out_power POE Out Power
      2 # HELP mktxp_poe_out_current POE Out Current
      1 python_info{implementation="CPython",major="3",minor="12",patchlevel="4",version="3.12.4"} 1.0
      1 python_gc_objects_uncollectable_total{generation="2"} 0.0
      1 python_gc_objects_uncollectable_total{generation="1"} 0.0

There shouldn't be more than 1 exemplar of each timeseries served by the HTTP metrics server. Here, we can see that the following time series appears two times (with the exact same set of labels).

mktxp_poe_out_voltage{routerboard_address="10.0.0.1",routerboard_name="Mikrotik"} 51.9

Also, I'm worrying because each time mktxp is curled, it doesn't always return the same number of metrics. It looks like there might be an issue in the metrics generation mechanism (which should normally always present stable time series that are not always appearing and disappearing).

mwetterw commented 6 days ago

PR #157 seems to be handling one of the two issues I mentioned above.

akpw commented 5 days ago

@mwetterw can you check with the PR merged?

regarding 'always the same number of metrics' concern, most likely a function of the actual data as returned by the routers.

mwetterw commented 2 days ago

PR fixes the issue. Thanks!