brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.65k stars 114 forks source link

address exceptions and timeouts in the readme #121

Closed grosser closed 9 years ago

grosser commented 9 years ago

@brandonhilkert

brandonhilkert commented 9 years ago

Happy to add the exception details, but the Ruby Timeout lib isn't recommended: https://github.com/mperham/sidekiq/wiki/Problems-and-Troubleshooting#writing-thread-safe-code

grosser commented 9 years ago

The proper writeup is: http://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api

The Timeout bug discussed there only affects persistent connections to datastores.

We are using timeout for a large sidekiq + celluloid installation that runs 10+k jobs per minute and have not seen anything like that and saw 2 different runs-forever cases in this installation (processor too slow to finish an inbox that kept getting refilled + hanging on writing a file) + 1 in the new project I just started (bad code in a loop).

So I think not mentioning timeouts would let more people run into trouble then mentioning them. Updated readme to explain a bit more / link to the article.

brandonhilkert commented 9 years ago

I try to keep the documentation for Sucker Punch inline with Sidekiq. If the use of Timeout isn't recommended in Sidekiq, I'd like to keep the same in Sucker Punch.

Ping @mperham for his thoughts

brandonhilkert commented 9 years ago

BTW, the link I posted above links to the same article you linked :) Just easier access from the Wiki since there's other great pointers in there too.

grosser commented 9 years ago

I added the link to the article into the wiki too ;)

On Mon, Aug 10, 2015 at 10:00 AM, Brandon Hilkert notifications@github.com wrote:

BTW, the link I posted above links to the same article you linked :) Just easier access from the Wiki since there's other great pointers in there too.

— Reply to this email directly or view it on GitHub https://github.com/brandonhilkert/sucker_punch/pull/121#issuecomment-129526829 .

grosser commented 9 years ago

@mperham do you think this readme change is good / how would you suggest timing out code that could potentially run forever ?

mperham commented 9 years ago

@grosser You are an exception, an expert Rubyist. My choice was: I can try to educate hundreds or thousands of users on the subtleties and complexities of timeout usage or I can simply proclaim it unsafe for use. I decided on the latter because the former leads to unlimited support costs.

grosser commented 9 years ago

So yeah we can make the default be 'do not use/rely on Timeout', but there is still the problem of having jobs just run forever because of bad code, isn't that equally bad ? An alternative solution could be to crash the whole process on Timeout, so connections get closed / people do not use it for 'normal' workflows / do not open issues about mysterious bugs :trollface:

mperham commented 9 years ago

Here's what Google does in Go to handle Timeouts. tl;dr It's not easy and requires major changes to your API. There's no easy way to handle timeouts in a complex block of code with lots of I/O.

grosser commented 9 years ago

So you'd prefer to advise against timeouts and have people run into endless jobs ?

mperham commented 9 years ago

Said another way, would you prefer mysterious random corruption or explicit problems?

grosser commented 9 years ago

I'd prefer Timeout -> explosion -> I know I have a problem vs Infinite job -> this is slow/queue is stuck/idk what's going on.

Something like: Using Timeout causes persistent connections to randomly get corrupted, do not use it as control flow, use it as last resort to know something went very wrong. Ideally restart the worker process after every timeout.

mperham commented 9 years ago

I see more reports of timeout corruption than endless jobs but YMMV. Both options are miserable. ¯_(ツ)_/¯

grosser commented 9 years ago

Yeah ... no nice solution :D

I'd like to at least mention it in some way, since it's not very intuitive ... maybe also for the readme of sidekiq ... any of these options look good to you @brandonhilkert ?

Option A:

## Timeouts

Using `Timeout` causes persistent connections to [randomly get corrupted](http://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api),
do not use it as control flow, use builtin connection timeouts. If you decide to use Timeout, 
only use them as last resort to know something went very wrong and 
ideally restart the worker process after every timeout.

Option B:

## Timeouts

Do not use `Timeout`, it causes persistent connections to [randomly get corrupted](http://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api).

Option C:

## Timeouts

Do not use `Timeout` [it's evil](http://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api).
brandonhilkert commented 9 years ago

I like the explanation of A.


_Build a Ruby Gem is available! http://brandonhilkert.com/books/build-a-ruby-gem/?utm_source=gmail-sig&utm_medium=email&utm_campaign=gmail_

http://brandonhilkert.com

On Tue, Aug 11, 2015 at 12:38 AM, Michael Grosser notifications@github.com wrote:

Yeah ... no nice solution :D

I'd like to at least mention it in some way, since it's not very intuitive ... maybe also for the readme of sidekiq ... any of these options look good to you @brandonhilkert https://github.com/brandonhilkert ?

Option A:

Timeouts

Using Timeout causes persistent connections to randomly get corrupted, do not use it as control flow, use builtin connection timeouts. If you decide to use Timeout, only use them as last resort to know something went very wrong and ideally restart the worker process after every timeout.

Option B:

Timeouts

Do not use Timeout, it causes persistent connections to randomly get corrupted.

Option C:

Timeouts

Do not use Timeout it's evil.

— Reply to this email directly or view it on GitHub https://github.com/brandonhilkert/sucker_punch/pull/121#issuecomment-129700910 .

grosser commented 9 years ago

updated the readme

brandonhilkert commented 9 years ago

Thanks!