ITRS-Group / monitor-merlin

Module for Effortless Redundancy and Loadbalancing In Naemon
https://itrs-group.github.io/monitor-merlin/
GNU General Public License v2.0
22 stars 14 forks source link

[Question] New peer unsynced #2

Closed oneingan closed 7 years ago

oneingan commented 8 years ago

Hi, finally I get working an automated load balanced naemon core with merlin. But I have a problem now (maybe is working as expected).

If I add a comment (or downtime) with two peers running, the comment spread all over the peers. But If I add a new third peer this one doesn't get the old information from other peers.

Is this expected? Maybe some option is needed to get in sync?

Thanks in advance.

pengi commented 8 years ago

It's a known issue, since the goal from the beginning was more or less just to distribute the check execution and notifications. But for most practical scenarios it should be possible to solve anyway. And it's possible to manually sync if something happens later too.

We usally recommend to use one peer as the "main" peer, to do all user interactions upon. However, it doesn't solve the downtime problem, and also for custom notiifications.

It should be possible to do is to copy the state retention file from an existing peer to the new peer while setting it up, to manually sync the initial data. See state_retention_file

For shorter outages later, the existing nodes keeps a log (binlog) of all messages that should have been sent to the different peers while they are offline, and then play those back when the peer connects, so missed events will be propagated anyway. For longer outages, when the binlog gets full, you can resync by copying the retention data manually.