Yelp / sensu_handlers

Custom Sensu Handlers to support a multi-tenant environment, allowing checks themselves to emit the type of handler behavior they need in the event json
Apache License 2.0
75 stars 31 forks source link

Develop watchdog handler #18

Closed solarkennedy closed 8 years ago

solarkennedy commented 9 years ago

The sensu watchdog requires handler help to detect the present of a watchdog_timer key, if present it should update/add a stash to record the last event. The key is an int that represents seconds. The deployed stash should probably be something like watchdog/$fqdn/$check_name to be similar to silences.

This is all that it needs to do. A separate process is in charge of detecting stale watchdogs checks and spawning new events based on them.

Acceptance criteria is when there is some watchdog.rb file in our default handler array that does this, and you can see stashes show up after an event that uses the watchdog_timer key.

Use the send-test-sensu-alert script to aid in troubleshooting. Use hiera to selectively deploy the handler in production.

There should be tests that go with it. I want to see tests that show that:

solarkennedy commented 9 years ago

cc @georgebashi @somic @trumpcard @keymone @lmovsesjan

solarkennedy commented 9 years ago

The deployed stash should look like this I think:

$ sensu-cli stash list
-------
path:  watchdog/test_fqdn/test_check
content:  {
  "timestamp"=>1421348189,
  "watchdog_timer" => $watchdog_timer,
  "source" => "watchdog_handler"
}
expire:  (30days?)

Then the consuming side can do the math to find expired stashes?

solarkennedy commented 9 years ago

@trumpcard hold on, stop development.

This can't be handler :(

It has to be an extension or responsibilities have to be shoved to the server-check that iterates.

The reason is because watchdog updates have to happen on every good event, not just the failing ones. Normal handlers only activate on state changes and on non-0 events.

Probably easier to explain in person why this makes it more complex than I was hoping, but there might be workarounds.

solarkennedy commented 9 years ago

@trumpcard at this point I don't know if this is realistically going to be upstream any time soon. I support you pushing forward on this as a sensu extension (to create stashes) + sensu meta-check (to create new events).

This code needs to be solid though, as the extension can break the sensu server.

If you want we can work on it together, or call on help from a rubiest, or just do it! (with code review)

bobtfish commented 9 years ago

I'm in.

solarkennedy commented 8 years ago

Closing this as we have TTLs now.