flightstats / hub

fault tolerant, highly available service for data storage and distribution
http://www.flightstats.com
MIT License
103 stars 35 forks source link

HUB-994: Don't try to start a paused webhook in our periodically-run webhook coordinator #1249

Closed lkemmerer closed 4 years ago

lkemmerer commented 4 years ago

The bug fix (in the last commit) addresses the following scenario:

  1. A user pauses a webhook
  2. The WebhookCoordinator scheduled task runs, sees that the paused webhook is not currently running, and sends off an internal request to start it up.
  3. Somewhere down the stack in that internal start resource call, the run request fails (because the webhook is paused) and returns a 400 error.
  4. This happens on each node every x (x=5?) minutes. When I paused all the batch webhooks, this caused havoc on the cluster.

It fixes it by, y'know, taking webhook paused state into consideration when the WebhookCoordinator decides what actions to take re: running the webhook.

Commits 2, 4, 6, and 7 contain actual refactoring and changes to behavior. Those commit messages include the main class to look at and the responsibility of that class.

lkemmerer commented 4 years ago

By log statements, where do you mean, exactly? Each log statement includes the thread it's running in, which I've found mostly to be sufficient in debugging issues. (Also, depending on your answer, I'd rather not have request ids permeate outside the resource or client classes as that seems like it'd break encapsulation.)