brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.65k stars 114 forks source link

Counter tests more deterministic. #157

Closed jdantonio closed 8 years ago

jdantonio commented 8 years ago

Race condition in counter tests

The test SuckerPunch::JobTest#test_processed_jobs_is_incremented_when_enqueued_with_perform_in occasionally fails on parallel runtimes like JRuby (see here, here, and here). The problem is that the counter gets updated after the job processes, but the test triggers off a latch which is set within the job itself. So on failure the root cause is that, thanks to context switching, the counter is checked before it's updated. The flow is:

This PR makes the tests more deterministic (but more complicated). It uses more complex locking in some cases. In others it shuts down the thred pools before checking counters. Each of the tests runs in about 0.0005 seconds on MRI.

jdantonio commented 8 years ago

And there appears to be a similar race condition in this test. On JRuby it seems that we may grab the SuckerPunch::Queue.stats object before the final job updates the counter. This one is tougher. If we shut down the thread pool before we grab the stats then the numbers are wrong. :-(

Another option (which I do on c-r) is to tag tests like this that we know are problematic then exclude them from Travis. Another option is to inject an ImmediateExecutor for this test. Or just leave it. We know the code is good. Concurrency is just really hard to test.

brandonhilkert commented 8 years ago

:heart: :+1:

brandonhilkert commented 8 years ago

@jdantonio I'm really struggling with these JRuby tests. I'd like to include them, but they've been nothing but a pain. I'd previously never included them in the test suite and have never gotten a bug around lack of support. Not only do they seem fragile, but they significantly slow down the suite as a whole.

I'm all for including them if they make the gem more robust, but I'm struggling to see the value right now. Do you have any suggestions?

jdantonio commented 8 years ago

I hear you. When I suggested including them it was because c-r supports JRuby so I thought it would be an easy win for you. I didn't anticipate that we'd have these difficult tests. So it's not the easy win I thought it would be. :-( Sorry about that.

I can say with a high degree of certainty that sucker_punch is stable and reliable on JRuby, but if no one is asking for JRuby support then I think you're right that it's probably not worth the effort.

brandonhilkert commented 8 years ago

All good. I, too, was hoping they'd be reliable so we could rest comfortably knowing it was tested on all platforms. I'll remove it for now and we can revisit if/when the request comes up for JRuby support, not that it wouldn't work just fine now...

jdantonio commented 8 years ago

But FTR, the tests are not correct on MRI, either. We just get lucky. This is what I referred to as "accidental correctness" in my RubyConf presentation. We only get away with this on MRI because of the GIL. Ruby does not have a formal memory model, nor does it have a written specification. And the official test suite does not formally tests the GIL. This is why JRuby can pass the Ruby test suite without a GIL. Strictly speaking, when we say "it passes on MRI so we're good" we're actually depending on undocumented behavior of MRI. That behavior could change with any release. WRT concurrency it's highly unlikely that it will change, but it could. This is what Matz is referring to when he says "if I remove the GIL a bunch of programs will break." It's possible to write formally incorrect concurrent programs in MRI but get lucky because of the GIL.

In this case the code is correct, it's the test that's wrong. The intermittent failures on JRuby don't indicate a problem with JRuby. They reveal a problem with the tests.

That being said, I still think your fine removing JRuby from Travis. I just don't want your users to think that there is a problem with JRuby or that sucker_punch is unsafe on JRuby. concurrent-ruby provides strong guarantees on JRuby (which has a formally specified, written memory model) and includes many runtime optimizations on JRuby. Subsequently, sucker_punch will be awesome on JRuby.

brandonhilkert commented 8 years ago

👍🏼 thanks for the great details!

On Saturday, January 23, 2016, jdantonio notifications@github.com wrote:

But FTR, the tests are not correct on MRI, either. We just get lucky. This is what I referred to as "accidental correctness" in my RubyConf presentation https://www.youtube.com/watch?v=dP4U1yI1WZ0. We only get away with this on MRI because of the GIL. Ruby does not have a formal memory model, nor does it have a written specification. And the official test suite does not formally tests the GIL. This is why JRuby can pass the Ruby test suite without a GIL. Strictly speaking, when we say "it passes on MRI so we're good" we're actually depending on undocumented behavior of MRI. That behavior could change with any release. WRT concurrency it's highly unlikely that it will change, but it could. This is what Matz is referring to when he says "if I remove the GIL a bunch of programs will break." It's possible to write formally incorrect concurrent programs in MRI but get lucky because of the GIL.

In this case the code is correct, it's the test that's wrong. The intermittent failures on JRuby don't indicate a problem with JRuby. They reveal a problem with the tests.

That being said, I still think your fine removing JRuby from Travis. I just don't want your users to think that there is a problem with JRuby or that sucker_punch is unsafe on JRuby. concurrent-ruby provides strong guarantees on JRuby (which has a formally specified, written memory model https://github.com/jruby/jruby/wiki/Concurrency-in-jruby) and includes many runtime optimizations on JRuby. Subsequently, sucker_punch will be awesome on JRuby.

— Reply to this email directly or view it on GitHub https://github.com/brandonhilkert/sucker_punch/pull/157#issuecomment-174183799 .


_Build a Ruby Gem is available! http://brandonhilkert.com/books/build-a-ruby-gem/?utm_source=gmail-sig&utm_medium=email&utm_campaign=gmail_

http://brandonhilkert.com