bgp / stayrtr

RPKI-To-Router server implementation in Go
BSD 3-Clause "New" or "Revised" License
91 stars 13 forks source link

Interval of update loop varies and can slow down on slow backend #53

Closed ties closed 1 year ago

ties commented 2 years ago

As stayrtr operator I want stayrtr to keep fetching updates if the backend system is slow or not responsive.

If I want updates every 10 minutes, and a update takes 5 minutes, I want the next update to run 10 minutes after the previous one started. Not 15 minutes after (10 minutes after the previous finished).

Context

When running stayrtr from a slow connection (4G was not cooperating) I noticed that the update loop does not have a set interval but has a set delay. If the response of SLURM or the JSON are slow the loop takes (much) longer.

Root cause

Handling slow responses is a hard problem. It ends up being a tradeoff between liveliness of the whole system or getting all information.

For example, in my rpki-client wrapped I found that some repositories were so slow that they prevented me from updating on time. I decided to add a utility to timeout/abort fetching from slow repos. There I decided finishing an update was more important than having all information.

Desired behaviour

first of all:

then:

ties commented 2 years ago

It could also be that performing the update interval after the previous one finished is the desired behaviour. In that case this one can be closed (and I'll make a separate issue for the http metrics part).

randomthingsandstuff commented 2 years ago

I agree with your view on the matter and noticed this working on VRP expiry stuff.

That whole refresh/VRP expiry piece (in #15) needs to be broken out to accomplish this and test it properly. So I should be able to address this as part of that work.

When I push them, we should discuss the default timer values.

benjojo commented 1 year ago

I split two of the subpoints into their own tickets, since they are worth their own investigations for now.

But the update loop now happens consistently, even if the backend is slow.

And VRP+SLURM updates are done in parallel