Yelp / sensu_handlers

Custom Sensu Handlers to support a multi-tenant environment, allowing checks themselves to emit the type of handler behavior they need in the event json
Apache License 2.0
75 stars 31 forks source link

Added a timeout_and_retry block for pagerduty #20

Closed solarkennedy closed 9 years ago

solarkennedy commented 9 years ago

@keymone @bobtfish @georgebashi

I need some ruby help. I want to be able to retry pagerduty attempts 3 times, sleep a bit, and catch timeouts.

I'm pretty sure this is harder than it sounds. My ruby code almost certainly doesn't work, and I don't know enough rspec to even begin to test this.

Summary: I have no idea what I'm doing

Internal ticket: OPS-4661

bobtfish commented 9 years ago

Does this look ok? I'm pretty sure it does what you want, and I wrote tests :)

https://github.com/Yelp/sensu_handlers/commits/tdoran_timeout_and_retry

solarkennedy commented 9 years ago

@bobtfish I think it looks better :) Can you get rid of the big begin/rescue in the main loop? We shouldn't need it anymore.

Also, can you make the retry block go again, if the block returns something false? (not just timeout)

I don't know what the code would look like, but the tests would be something like:

 it "Will try a second time if the first time doesn't work, not timeouts" do
        subject.timeout_count = 0
        expect(subject).to receive(:trigger_incident).and_return(false)
        expect(subject).to receive(:trigger_incident).and_return(true)
        subject.handle
end

? I.. don't know what I'm typing, but I'm trying to make it so it retries on a 500 or something.

Also, can it be made in such a way that it can be in the base.rb so we can reuse this code with other handlers?

bobtfish commented 9 years ago

The begin / rescue in the main loop is needed, as we still throw an exception if we timeout that needs to be caught. (I did this so that we get to keep the logging of ok / failed / timeout, rather than just having a non-specific failure to cover both failed and timed out)

I believe the retry block should go again if we return false already, but you're correct that we need a test to verify that behavior, will add!

And yeah - we can just move the retry_with_timeout method up to base.rb.

bobtfish commented 9 years ago

Bump. I've pushed amendments to my branch, thanks for the feedback!