NLnetLabs / rtrtr

An RPKI Data Proxy
https://nlnetlabs.nl/projects/routing/rtrtr/
BSD 3-Clause "New" or "Revised" License
31 stars 7 forks source link

metric for difference between endpoints that has consisted for t seconds #37

Open ties opened 2 years ago

ties commented 2 years ago

As a user, I want a metric for the difference between various sources that has consisted for t time so that I can monitor that my various sources (rtr, json, ...) converge.

Situation

Because the publication of VRPs is continuous, the RPs will have a slightly different view of what VRPs exist. If you want to monitor that they converge, you could alert on: "the difference is continuously non-zero for 30 minutes" (and assume it drops to zero at some point in time).

In practice, this causes false positives if updates are frequent enough. Another way to go is to check what objects in A are not in B, and were seen in A at least visibility_seconds ago. That way you can have

This is similar to what I added to rtrmon, where there is a vrp_diff for objects that were seen in the source for the first time visibility_seconds ago.

Maybe a real set of metrics is clearer:

# HELP rpki_vrps Total number of VRPS/amount of differents.
# TYPE rpki_vrps gauge
rpki_vrps{server="primary",type="diff",url="http://routinator-1:9556/json"} 1110
rpki_vrps{server="primary",type="total",url="http://routinator-1:9556/json"} 143981
rpki_vrps{server="secondary",type="diff",url="https://ca-software/api/monitoring/roa-prefixes"} 1
rpki_vrps{server="secondary",type="total",url="https://ca-software/api/monitoring/roa-prefixes"} 142872
# HELP rtr_serial Serial of the RTR session.
# TYPE rtr_serial gauge
rtr_serial{server="primary",url="http://routinator-1:9556/json"} 0
rtr_serial{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP rtr_session ID of the RTR session.
# TYPE rtr_session gauge
rtr_session{server="primary",url="http://routinator-1:9556/json"} 0
rtr_session{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP update Timestamp of last update.
# TYPE update gauge
update{server="primary",url="http://routinator-1:9556/json"} 1.637752522e+09
update{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 1.63775261e+09
# HELP vrp_diff Number of VRPS in [lhs_url] that are not in [rhs_url] that were first seen [visibility_seconds] ago in lhs.
# TYPE vrp_diff gauge
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="0"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1024"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1706"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="256"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="3411"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="56"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="596"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="851"} 1110
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="0"} 1
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1024"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1706"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="256"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="3411"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="56"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="596"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="851"} 0

In the diagram you see that the instantaneous difference grows, but the long-term difference never grows:

Screenshot 2021-11-24 at 12 23 15
partim commented 2 years ago

Am I understanding you right that you want to track any single difference and how long it’s been around and count those that have been around for more than t seconds?