Factual / skuld

Distributed task tracking system.
300 stars 13 forks source link

Random test failure: stress-test: task already claimed #74

Closed eric closed 10 years ago

eric commented 10 years ago

https://travis-ci.org/Factual/skuld/jobs/29787355#L1682

2014-07-12T19:30:54.036 INFO  skuld.node: 127.0.0.1:13003: claim-local: claiming id from queue: #<Bytes 000001472c0e7edd800000010000000000000828>
2014-07-12T19:30:54.039 WARN  skuld.node: 127.0.0.1:13003: caught while claiming #<Bytes 000001472c0e7edd800000010000000000000828> from vnode 127.0.0.1:13003/skuld_1
java.lang.IllegalStateException: task already claimed
    at skuld.task$request_claim.invoke(task.clj:69) ~[na:na]
    at skuld.task$claim.invoke(task.clj:78) ~[na:na]
    at skuld.db.level.Level.claim_task_BANG_(level.clj:27) ~[na:na]
    at skuld.vnode$claim_BANG_.invoke(vnode.clj:490) ~[na:na]
    at skuld.node$claim_local_BANG_$fn__4892.invoke(node.clj:305) ~[na:na]
    at skuld.node$claim_local_BANG_.invoke(node.clj:304) [na:na]
    at skuld.node$claim_BANG_.invoke(node.clj:320) [na:na]
    at skuld.node$handler$handler__4933.invoke(node.clj:420) [na:na]
    at skuld.net$compile_handler$compiled_handler__4077.invoke(net.clj:297) [na:na]
    at skuld.net$handler$fn__4027$fn__4028$fn__4029.invoke(net.clj:110) [na:na]
    at skuld.net$handler$fn__4027$fn__4028.invoke(net.clj:110) [na:na]
    at clojure.core$binding_conveyor_fn$fn__4107.invoke(core.clj:1836) [clojure-1.5.1.jar:na]
    at clojure.lang.AFn.call(AFn.java:18) [clojure-1.5.1.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
    at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
eric commented 10 years ago

This actually wasn't a fatal exception. During shifting leadership of a partition, a node who was previously the leader will still have all of the task IDs in the nodes queue, but it will no longer be able to serve those tasks.

I've cleaned up how these exceptions are logged in #85.