flapjack / flapjack

Monitoring notification routing + event processing system. For issues with the Flapjack packages, please see https://github.com/flapjack/omnibus-flapjack/
http://flapjack.io
MIT License
640 stars 92 forks source link

When multiple Flapjack instances are used all bots respond in Jabber #908

Open CVTJNII opened 8 years ago

CVTJNII commented 8 years ago

In 1.6.0 if three Flapjacks are set up for HA, and all with the jabber gateway enabled, then all commands are responded to in triplicate:

flapjack tell me about datagen-1
so you'd like details on entity: datagen-1 hmm? ... OK!

---
datagen-1:Test Data
In scheduled maintenance: 2015-10-22 23:44:08 +0000 -> 2115-10-23 23:44:08 +0000 (36524 days, 21 hours, 45 minutes, 58 seconds remaining)
Not in unscheduled maintenance.
so you'd like details on entity: datagen-1 hmm? ... OK!

---
datagen-1:Test Data
In scheduled maintenance: 2015-10-22 23:44:08 +0000 -> 2115-10-23 23:44:08 +0000 (36524 days, 21 hours, 45 minutes, 58 seconds remaining)
Not in unscheduled maintenance.
so you'd like details on entity: datagen-1 hmm? ... OK!

---
datagen-1:Test Data
In scheduled maintenance: 2015-10-22 23:44:08 +0000 -> 2115-10-23 23:44:08 +0000 (36524 days, 21 hours, 45 minutes, 58 seconds remaining)
Not in unscheduled maintenance.

While the Jabber gateway can only be enabled on one host, it poses a bit of a HA problem as then that host becomes a single point of failure. Some sort of HA solution would be nice.

Perhaps some sort of master election where only one node will respond to commands, but another will take over if a response isn't seen within a timeout?

ghost commented 8 years ago

I would think this would work if they were configured with different JIDs? You lose HA, but you would be able to address them individually. (I could be wrong, it's a while since I've seen this code.) A way around that might be an intermediate proxy bot to do what you suggest, which might be a bit out of Flapjack's ambit as it sounds like something that could be useful in a wider context.

CVTJNII commented 8 years ago

(Edit: bad wording) I don't think multiple JIDs are needed as the identifiers are what is important. I set unique identifiers and was able to work around this as then each bot only responded to it's unique id.

However, I did find a larger bug. When trying to stop the flapjacks:

Flapjack 3:

^C2015-10-23T02:41:17.614251+00:00 [INFO] :: flapjack-email :: stopping
2015-10-23T02:41:17.614562+00:00 [INFO] :: flapjack-coordinator :: web: stopping -> stopped
2015-10-23T02:41:17.614592+00:00 [INFO] :: flapjack-coordinator :: jsonapi: stopping -> stopped
2015-10-23T02:41:17.615807+00:00 [INFO] :: flapjack-processor :: Exiting main loop.
2015-10-23T02:41:17.864772+00:00 [INFO] :: flapjack-coordinator :: processor: stopping -> stopped

Flapjack 1:

2015-10-23T02:41:17.615506+00:00 [DEBUG] :: flapjack-jabber :: jabber notification event received: {"notification_type"=>"shutdown"}
2015-10-23T02:41:17.615607+00:00 [DEBUG] :: flapjack-jabber :: @should_quit:
2015-10-23T02:41:17.615656+00:00 [DEBUG] :: flapjack-jabber :: jabber is connected so commencing blpop on jabber_notifications
2015-10-23T02:42:03.384447+00:00 [DEBUG] :: flapjack-jabber :: calling keepalive on the jabber connection

With all 3 logged in on the same account I'm not always able to shut them down. One appears to ack the shutdown for another leading to instances hanging and no longer responding to ctrl-c / SIGTERM.

Also, while I'm sure this has to do with using the same JID, I'd strongly prefer to not use multiple JIDs as then I'd have 3 bots in the chatrooms. I'd prefer to present a unified front to the users.

jessereynolds commented 8 years ago

That second issue you've brought up would not be addressed by using multiple jabber IDs because currently Flapjack assumes you're only running up a single Jabber gateway instance. We'd need to register separate queues for each jabber gateway instance I think...