MetPX / sarracenia

https://MetPX.github.io/sarracenia
GNU General Public License v2.0
45 stars 22 forks source link

a sort of "standby mode & catchup mode" for subscribers to offload brokers during long outages. #1275

Open petersilva opened 2 hours ago

petersilva commented 2 hours ago

For prolonged outages on the client (a subscriber's disk goes down, as an example) where a subscriber (or sender) knows that they won't be able to resume transfers any time soon.

The subscriber will try to download each file (3 times), and put them in a retry_queue. The effort of trying to download will be slow, is a wasted load, and might not keep up with the messages being posted.

Then you know you won't be able to complete transfers for some time, there should be a way to just put the messages directly in the retry_queue without trying the download.

When returned to service, we would need to always replay that retry_queue, and let new stuff queue up... or even add new stuff to the retry_queue instead of playing current stuff.... a "catch-up" mode.

Clarification:

The above describes what happens if you leave the subscriber running when you know downloads will fail. An alternate strategy would be to stop the subscribers, and let the queue build up on the brokers. All tuning/performance advice for all messages brokers recommends keeping queues short. Large Queues slow down the entire broker, compromising other flows. It is considered far better to persist the queues to local storage rather then have the backlog build up (into millions) on the broker.

petersilva commented 2 hours ago

one way of doing it... could perhaps have an automatism...

At some point, we decide we have caught up, (e.g. retry queue is empty) return to running normally ("running")

So analyst intervention would be zero. They just start up the subscriber, it automatically goes into standby when it realizes it cannot write... and when it detects things are back, it catches up, and then goes back to normal.

petersilva commented 2 hours ago

@junhu3 had a different idea? Now is the time to kick around different ones... the standby/catchup thing is what I was thinking... I was originally thinking it had to be manual, but maybe automatic works... other ideas?