bgp / stayrtr

RPKI-To-Router server implementation in Go
BSD 3-Clause "New" or "Revised" License
91 stars 13 forks source link

Track objects that differ for a longer period separately #9

Closed ties closed 3 years ago

ties commented 3 years ago

Use case: track how long it takes for VRPs in one endpoint (could be an internal source of truth or a RP) in another.

For example:

# HELP vrp_diff Number of VRPS in [lhs_url] that are not in [rhs_url] that were first seen [visibility_seconds] ago in lhs.
# TYPE vrp_diff gauge
vrp_diff{lhs_url="https://routinator-demo.aws.nlnetlabs.nl/json",rhs_url="tcp://rtr.rpki.cloudflare.com:8282",visibility_seconds="0"} 2241
vrp_diff{lhs_url="https://routinator-demo.aws.nlnetlabs.nl/json",rhs_url="tcp://rtr.rpki.cloudflare.com:8282",visibility_seconds="256"} 2240
vrp_diff{lhs_url="https://routinator-demo.aws.nlnetlabs.nl/json",rhs_url="tcp://rtr.rpki.cloudflare.com:8282",visibility_seconds="56"} 2241
vrp_diff{lhs_url="tcp://rtr.rpki.cloudflare.com:8282",rhs_url="https://routinator-demo.aws.nlnetlabs.nl/json",visibility_seconds="0"} 5
vrp_diff{lhs_url="tcp://rtr.rpki.cloudflare.com:8282",rhs_url="https://routinator-demo.aws.nlnetlabs.nl/json",visibility_seconds="256"} 5
vrp_diff{lhs_url="tcp://rtr.rpki.cloudflare.com:8282",rhs_url="https://routinator-demo.aws.nlnetlabs.nl/json",visibility_seconds="56"} 5

What you see here is that one of the VRPs not present in the routinator endpoint is "recent".

When you monitor multiple instances (for example, one rsync vs rrdp, or stayrtr vs its input) a time-lag between various instances is logical. This time lag is not guaranteed to "go to 0" when there are continuous updates. My experience is that alerting on objects that have been seen for a while prevents spurious alerts.

This is available in a container for testing at https://hub.docker.com/repository/docker/tiesdekock/rtrmon