Closed solarkennedy closed 8 years ago
cc @georgebashi @somic @trumpcard @keymone @lmovsesjan
The deployed stash should look like this I think:
$ sensu-cli stash list
-------
path: watchdog/test_fqdn/test_check
content: {
"timestamp"=>1421348189,
"watchdog_timer" => $watchdog_timer,
"source" => "watchdog_handler"
}
expire: (30days?)
Then the consuming side can do the math to find expired stashes?
@trumpcard hold on, stop development.
This can't be handler :(
It has to be an extension or responsibilities have to be shoved to the server-check that iterates.
The reason is because watchdog updates have to happen on every good event, not just the failing ones. Normal handlers only activate on state changes and on non-0 events.
Probably easier to explain in person why this makes it more complex than I was hoping, but there might be workarounds.
@trumpcard at this point I don't know if this is realistically going to be upstream any time soon. I support you pushing forward on this as a sensu extension (to create stashes) + sensu meta-check (to create new events).
This code needs to be solid though, as the extension can break the sensu server.
If you want we can work on it together, or call on help from a rubiest, or just do it! (with code review)
I'm in.
Closing this as we have TTLs now.
The sensu watchdog requires handler help to detect the present of a
watchdog_timer
key, if present it should update/add a stash to record the last event. The key is an int that represents seconds. The deployed stash should probably be something likewatchdog/$fqdn/$check_name
to be similar to silences.This is all that it needs to do. A separate process is in charge of detecting stale watchdogs checks and spawning new events based on them.
Acceptance criteria is when there is some watchdog.rb file in our default handler array that does this, and you can see stashes show up after an event that uses the
watchdog_timer
key.Use the send-test-sensu-alert script to aid in troubleshooting. Use hiera to selectively deploy the handler in production.
There should be tests that go with it. I want to see tests that show that:
watchdog_timer => nil
nothing happenswatchdog_timer => 60
expect it to receivecreate_stash
or something like that.