flightstats / hub

fault tolerant, highly available service for data storage and distribution
http://www.flightstats.com
MIT License
103 stars 35 forks source link

Add a super-basic internal endpoint to more easily get list of webhooks that nodes are leading #1238

Closed lkemmerer closed 4 years ago

lkemmerer commented 4 years ago

One of the longstanding problems that we've had with webhooks is that sometimes rolling restarts cause multiple nodes to run/lead the same webhook. That causes webhooks to send duplicate items, which is annoying to debug right now.

It's not elegant, but this introduces a debug endpoint which proxies to a random node and returns the webhooks running there. Refreshing a webpage and doing a search seems easier than sshing into some servers and grepping their logs. If this ends up being useful but still kind of a pain, we can do something more robust that returns webhook info for all nodes + ZK state in a single request.

This was helpful in debugging a different webhook issue, but it's not tested because there are no existing Java tests for our web resources. Getting that set up seemed like a lot of work when I was in the middle of something else. I'm ok with having this declined because of that, but figured I'd see if its usefulness in debugging production bugs seemed to outweigh the new tech debt.

lkemmerer commented 4 years ago

Yeah, that too. If it had been easy to add tests on any of those changes, I absolutely would have.

chriskessel commented 4 years ago

Yay!

lkemmerer commented 4 years ago

@chriskessel There's a PR to fix some (very rare...) batch channel data loss issues coming up that might also help with the duplicate webhook thing. 🤞