As a user, I want a metric for the difference between various sources that has consisted for t time so that I can monitor that my various sources (rtr, json, ...) converge.
Situation
Two different RPs
Refresh at different times
Because the publication of VRPs is continuous, the RPs will have a slightly different view of what VRPs exist. If you want to monitor that they converge, you could alert on: "the difference is continuously non-zero for 30 minutes" (and assume it drops to zero at some point in time).
In practice, this causes false positives if updates are frequent enough. Another way to go is to check what objects in A are not in B, and were seen in A at least visibility_seconds ago. That way you can have
This is similar to what I added to rtrmon, where there is a vrp_diff for objects that were seen in the source for the first time visibility_seconds ago.
Maybe a real set of metrics is clearer:
# HELP rpki_vrps Total number of VRPS/amount of differents.
# TYPE rpki_vrps gauge
rpki_vrps{server="primary",type="diff",url="http://routinator-1:9556/json"} 1110
rpki_vrps{server="primary",type="total",url="http://routinator-1:9556/json"} 143981
rpki_vrps{server="secondary",type="diff",url="https://ca-software/api/monitoring/roa-prefixes"} 1
rpki_vrps{server="secondary",type="total",url="https://ca-software/api/monitoring/roa-prefixes"} 142872
# HELP rtr_serial Serial of the RTR session.
# TYPE rtr_serial gauge
rtr_serial{server="primary",url="http://routinator-1:9556/json"} 0
rtr_serial{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP rtr_session ID of the RTR session.
# TYPE rtr_session gauge
rtr_session{server="primary",url="http://routinator-1:9556/json"} 0
rtr_session{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP update Timestamp of last update.
# TYPE update gauge
update{server="primary",url="http://routinator-1:9556/json"} 1.637752522e+09
update{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 1.63775261e+09
# HELP vrp_diff Number of VRPS in [lhs_url] that are not in [rhs_url] that were first seen [visibility_seconds] ago in lhs.
# TYPE vrp_diff gauge
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="0"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1024"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1706"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="256"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="3411"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="56"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="596"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="851"} 1110
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="0"} 1
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1024"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1706"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="256"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="3411"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="56"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="596"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="851"} 0
In the diagram you see that the instantaneous difference grows, but the long-term difference never grows:
Am I understanding you right that you want to track any single difference and how long it’s been around and count those that have been around for more than t seconds?
As a user, I want a metric for the difference between various sources that has consisted for
t
time so that I can monitor that my various sources (rtr, json, ...) converge.Situation
Because the publication of VRPs is continuous, the RPs will have a slightly different view of what VRPs exist. If you want to monitor that they converge, you could alert on: "the difference is continuously non-zero for 30 minutes" (and assume it drops to zero at some point in time).
In practice, this causes false positives if updates are frequent enough. Another way to go is to check what objects in
A
are not inB
, and were seen inA
at leastvisibility_seconds
ago. That way you can haveThis is similar to what I added to rtrmon, where there is a
vrp_diff
for objects that were seen in the source for the first timevisibility_seconds
ago.Maybe a real set of metrics is clearer:
In the diagram you see that the instantaneous difference grows, but the long-term difference never grows: