SNEWS2 / SNEWS_Coincidence_System

Coincidence System backend for snews alert trigger
BSD 2-Clause "Simplified" License
1 stars 2 forks source link

Coordination between multiple CS instances #29

Open habig opened 2 years ago

habig commented 2 years ago

So, it would be nice to have SNEWS servers running redundantly in different places, to dodge network or power problems.

The way it is right now, this would work out of the box for the decision-making algorithms: each server can subscribe to the same hopskotch stream and would make the same decisions independently.

But, we don't want multiple copies of the alerts going out and confusing people, be they emails or slack alerts or whatever. So, at any point in time, only one server should push alerts.

How best do do this? I could imagine something where the first server to reach a decision puts out its alert. Other servers, if they see an alert already that matches the one they are about to push, would not make a duplicate.

While adding the logic to do this, we would need to be careful about making the operation an "atomic" one, to avoid race conditions. That will take some thought.

mlinvill commented 1 year ago

I have made some progress on this issue. I've been looking into RAFT consensus algorithm implementations. There is a python implementation called pysyncobj. I have a toy implementation of a distributed lock using pysyncobj running stand-alone as a starting point. However, the reliability needs to be better tested under real-world failure scenarios before this should be considered to be incorporated into the coincidence system.

KaraMelih commented 3 months ago

Just to keep track, I think it is being worked at in here; https://github.com/SNEWS2/SNEWS_Coincidence_System/tree/mlinvill_feature_redundancy-in-alerting