bgp / stayrtr

RPKI-To-Router server implementation in Go
BSD 3-Clause "New" or "Revised" License
85 stars 13 forks source link

Enable remote_ip dimension in rtr_clients #120

Open netixx opened 4 months ago

netixx commented 4 months ago

This makes the rtr_clients track the number of connections per remote IP, enabling us to see if a client flaps, and what client it is.

I am not 100% sure that this is the right implementation, and ready to discuss better ideas :)

benjojo commented 4 months ago

This is something that should be enabled behind a CLI flag, since there will be users who will not want the cardinality in their prometheus setups that this would cause. Overall I can see the use case for it though, just we should be mindful of large users of stayrtr and the impact that a change like this might have on their existing setups

netixx commented 4 months ago

For cardinality issues, unless it's a public server, it should more or less be fine, since the IPs of the routers will not change too often. There is also a possibility to do label_replace/drop on prometheus server. Nonetheless, maybe a flag is the way to go to make sure we avoid breaking people stuff :)

Also, I was thinking that a Gauge may not be the best approach here. I think two counters, one "connects" and one for "disconnects" would be better, since it would allow us to see flaps that occur between 2 prometheus polling. Those counters could be registered/Inc by a feature flag.

Could also be a single "flap" counters, that increments whenever a router connects or disconnects.

What are your thoughts on that ?

sumkincpp commented 4 months ago

Another +1 for that.

Tracking remote ips is really useful for operational needs: different client implementations behave differently, for example how many sessions they hold active.

I've ended up using customly written connections_exporter for all connections - when it is a private server, new connections, flaps and restarts are rare. Yet when they happen, it may of course flood prometheus database.

For cardinality issues one should consider dropping labels in prometheus as mentioned above.

Nevertheless something that is possible but not trivial to implement within prometheus - tracking of flapping/resets from remote clients.

netixx commented 4 months ago

for resets, there is already

PDUsRecv = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "rtr_pdus",
            Help: "PDU received.",
        },
        []string{"type"},
    )

which is counter, but doesn't have the remote_ip label to it. I guess I could add it there.