Factual / skuld

Distributed task tracking system.
300 stars 13 forks source link

Random test failure: claim: slow running tests can cause failures #106

Open eric opened 10 years ago

eric commented 10 years ago

https://travis-ci.org/eric/skuld/jobs/31091778

The reclaim tests are time-dependent and use Thread/sleep, which causes problems if the system that the tests are being run on is overloaded.

2014-07-28T22:57:10.056 INFO  skuld.node: 127.0.0.1:13002: enqueue-local: enqueued id #<Bytes 000001477f3119a1800000010000000000000821> on vnode 127.0.0.1:13002/skuld_1 for task: {:claims [], :id #<Bytes 000001477f3119a1800000010000000000000821>, :data maus}
2014-07-28T22:57:10.056 INFO  skuld.node: 127.0.0.1:13003: enqueue-local: enqueued id #<Bytes 000001477f3119a1800000010000000000000821> on vnode 127.0.0.1:13003/skuld_1 for task: {:claims [], :id #<Bytes 000001477f3119a1800000010000000000000821>, :data maus}
2014-07-28T22:57:10.058 INFO  skuld.node: 127.0.0.1:13000: enqueue-local: enqueued id #<Bytes 000001477f3119a1800000010000000000000821> on vnode 127.0.0.1:13000/skuld_1 for task: {:claims [], :id #<Bytes 000001477f3119a1800000010000000000000821>, :data maus}
2014-07-28T22:57:10.074 INFO  skuld.vnode: 127.0.0.1:13002/skuld_1: claim: claiming task: {:data maus, :id #<Bytes 000001477f3119a1800000010000000000000821>, :claims [{:start 1406588230071, :end 1406588231071, :completed nil}]}
2014-07-28T22:57:10.081 INFO  skuld.node: 127.0.0.1:13002: claim-local: claim from 127.0.0.1:13002/skuld_1 returned task: {:data maus, :id #<Bytes 000001477f3119a1800000010000000000000821>, :claims [{:start 1406588230071, :end 1406588231071, :completed nil}]}
2014-07-28T22:57:11.752 INFO  skuld.vnode: 127.0.0.1:13002/skuld_1: claim: claiming task: {:data maus, :id #<Bytes 000001477f3119a1800000010000000000000821>, :claims [{:start 1406588230071, :end 1406588231071, :completed nil} {:start 1406588231749, :end 1406588232749, :completed nil}]}
2014-07-28T22:57:11.760 INFO  skuld.node: 127.0.0.1:13002: claim-local: claim from 127.0.0.1:13002/skuld_1 returned task: {:data maus, :id #<Bytes 000001477f3119a1800000010000000000000821>, :claims [{:start 1406588230071, :end 1406588231071, :completed nil} {:start 1406588231749, :end 1406588232749, :completed nil}]}
lein test :only skuld.claim-test/reclaim-test
FAIL in (reclaim-test) (claim_test.clj:44)
expected: (nil? (client/claim! *client* 1000))
  actual: (not (nil? {:data "maus", :id #<Bytes 000001477f3119a1800000010000000000000821>, :claims [{:start 1406588230071, :end 1406588231071, :completed nil} {:start 1406588231749, :end 1406588232749, :completed nil}]}))
2014-07-28T22:57:14.337 INFO  skuld.vnode: 127.0.0.1:13002/skuld_1: claim: claiming task: {:data maus, :id #<Bytes 000001477f3119a1800000010000000000000821>, :claims [{:start 1406588230071, :end 1406588231071, :completed nil} {:start 1406588231749, :end 1406588232749, :completed nil} {:start 1406588234334, :end 1406588235334, :completed nil}]}
2014-07-28T22:57:14.345 INFO  skuld.node: 127.0.0.1:13002: claim-local: claim from 127.0.0.1:13002/skuld_1 returned task: {:data maus, :id #<Bytes 000001477f3119a1800000010000000000000821>, :claims [{:start 1406588230071, :end 1406588231071, :completed nil} {:start 1406588231749, :end 1406588232749, :completed nil} {:start 1406588234334, :end 1406588235334, :completed nil}]}
lein test :only skuld.claim-test/reclaim-test
FAIL in (reclaim-test) (claim_test.clj:54)
expected: (= 2 (count (:claims t)))
  actual: (not (= 2 3))
eric commented 10 years ago

@aphyr: Do you think there's anything we should do to solve this?

The (Thread/sleep 1001) on line 43 took 3 seconds to complete instead of 1 second and caused the claim to be successful.

I'm trying to get to 100% consistently passing tests, but I am not sure how to best solve this one.

aphyr commented 10 years ago

Best way is probably to virtualize time using the Riemann.time scheduler. I should extract that into its own library.

eric commented 10 years ago

I love that idea. :+1: