Greater determinism on Job tests

jdantonio commented 8 years ago

See #152

I've submit this PR as two commits in case you don't like the second of the two. It changes the nature of the testing in a way you may not like.

The first commit increases the latch wait time from 0.2 seconds to 1.0 seconds on many tests. Ruby isn't a systems language so it's timers are horribly inaccurate. Additionally, the free Travis service runs the tests on very slow machines. In my experience a 0.1 second wait simply isn't enough. (Trust me on this, I've written/debugged more concurrency tests than most Rubyists! :-) ) It is impossible to make some of these tests 100% deterministic so the best we can hope for is very close. The 1.0 second wait won't slow down the tests when they pass (the latch waits the minimum time necessary) but it will reduce the possibility of false negatives. 1.0 seconds is my go-to for tests like this in c-r.

The second commit shifts a few tests to immediate (synchronous) execution. This makes these tests 100% deterministic, but it shifts some responsibility to c-r. It makes the assumption that c-r does what we promise it does and then only tests the behavior of sucker_punch.

Passing a timeout value of less than 0.1 seconds to Concurrent::ScheduledTask causes the task to be immediately post to the thread pool (see my comments about Ruby inaccurate timers above).

Concurrent::ImmediateExecutor is a fake thread pool that we use extensively for testing. It supports the full set of method calls that real thread pool expose but it runs jobs immediately on the current thread. It's part of our public API so it's safe to use--it won't go away and its API will always match that of our thread pools.

Depending on how you feel about the stubbing I've added, the test_perform_in_runs_job_in_future could use the same stubbing approach as well. The timing checks I've added should make it less susceptible to false negatives, but it's really testing c-r. I think it would be fine to trust that ScheduledTask does what we say it does and in your test simply assert that the job is post.

brandonhilkert commented 8 years ago

@jdantonio Thanks for the explanation :) Really helpful...I'm a little hesitant about the 2nd one. Since the test suite is so small, and frankly, so is the gem, I think I'd rather set the scheduled time to something small and actually wait for it to go out. That work?

jdantonio commented 8 years ago

Interestingly, that same test failed on two platforms. So my "fix" isn't really a fix. :-(

WRT the perform_in tests: Yes, those tests can be done with longer timers and will likely be very consistent. It may add a second or two to the test run, but that's not a big deal here. Unlike, say, the c-r test suite which takes several minutes to run because of all the timing/locking/waiting we do.

I'm a little puzzled by those intermittently failing tests. This could indicate an actual error and not a problem with the test. I'll take a look at it from that perspective. It'll be a good project to work on at a hack night I'm going to on Thursday.

brandonhilkert commented 8 years ago

FYI - I was adding some more to the test suite this AM and took your advice and bumped up the latch wait time, so that's covered. I'll close this one for now, feel free to open another with whatever you find tomorrow :)

brandonhilkert / sucker_punch

Greater determinism on Job tests #155