Closed cockroach-teamcity closed 4 years ago
SHA: https://github.com/cockroachdb/cockroach/commits/18519949df8e717c31a636bb153e526f45375ca8
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=jepsen/1/bank-multitable PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=919791&tab=buildLog
The test failed on master:
SHA: https://github.com/cockroachdb/cockroach/commits/18519949df8e717c31a636bb153e526f45375ca8
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=jepsen/1 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=919791&tab=buildLog
The test failed on master:
Another rare variant of the interrupted-at-shutdown problem (triple-posted). This one's on the nemesis thread, which doesn't really make sense because the nemesis is the thread that interrupts the others; I'm not sure where this interruption is coming from.
java.lang.InterruptedException: null
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[na:1.8.0_181]
at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[na:1.8.0_181]
at clojure.core$deref_future.invokeStatic(core.clj:2208) ~[clojure-1.8.0.jar:na]
at clojure.core$future_call$reify__6962.deref(core.clj:6688) ~[clojure-1.8.0.jar:na]
at clojure.core$deref.invokeStatic(core.clj:2228) ~[clojure-1.8.0.jar:na]
at clojure.core$deref.invoke(core.clj:2214) ~[clojure-1.8.0.jar:na]
at clojure.core$map$fn__4785.invoke(core.clj:2644) ~[clojure-1.8.0.jar:na]
at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.8.0.jar:na]
at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.8.0.jar:na]
at clojure.lang.RT.seq(RT.java:521) ~[clojure-1.8.0.jar:na]
at clojure.core$seq__4357.invokeStatic(core.clj:137) ~[clojure-1.8.0.jar:na]
at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24) ~[clojure-1.8.0.jar:na]
at clojure.core.protocols$fn__6738.invokeStatic(protocols.clj:75) ~[clojure-1.8.0.jar:na]
at clojure.core.protocols$fn__6738.invoke(protocols.clj:75) ~[clojure-1.8.0.jar:na]
at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13) ~[clojure-1.8.0.jar:na]
at clojure.core$reduce.invokeStatic(core.clj:6545) ~[clojure-1.8.0.jar:na]
at clojure.core$into.invokeStatic(core.clj:6610) ~[clojure-1.8.0.jar:na]
at clojure.core$into.invoke(core.clj:6604) ~[clojure-1.8.0.jar:na]
at jepsen.control$on_nodes.invokeStatic(control.clj:373) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.control$on_nodes.invoke(control.clj:357) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.control$on_nodes.invokeStatic(control.clj:362) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.control$on_nodes.invoke(control.clj:357) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.cockroach.auto$reset_clocks_BANG_.invokeStatic(auto.clj:223) ~[classes/:na]
at jepsen.cockroach.auto$reset_clocks_BANG_.invoke(auto.clj:220) ~[classes/:na]
at jepsen.cockroach.nemesis$bump_time$reify__1252.teardown_BANG_(nemesis.clj:255) ~[classes/:na]
at jepsen.cockroach.nemesis$restarting$reify__1240.teardown_BANG_(nemesis.clj:199) ~[classes/:na]
at jepsen.nemesis$teardown_compat_BANG_.invokeStatic(nemesis.clj:50) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.nemesis$teardown_compat_BANG_.invoke(nemesis.clj:44) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.nemesis$compose$reify__4401$fn__4407.invoke(nemesis.clj:193) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.util$map_vals$fn__1319.invoke(util.clj:503) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at clojure.core.reducers$map$fn__554$fn__555.invoke(reducers.clj:167) ~[knossos-0.3.3.jar:na]
at clojure.core.protocols$iter_reduce.invokeStatic(protocols.clj:49) ~[clojure-1.8.0.jar:na]
at clojure.core.protocols$fn__6742.invokeStatic(protocols.clj:75) ~[clojure-1.8.0.jar:na]
at clojure.core.protocols$fn__6742.invoke(protocols.clj:75) ~[clojure-1.8.0.jar:na]
at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13) ~[clojure-1.8.0.jar:na]
at clojure.core.reducers$folder$reify__537.coll_reduce(reducers.clj:126) ~[knossos-0.3.3.jar:na]
at clojure.core$reduce.invokeStatic(core.clj:6545) ~[clojure-1.8.0.jar:na]
at clojure.core$into.invokeStatic(core.clj:6610) ~[clojure-1.8.0.jar:na]
at clojure.core$into.invoke(core.clj:6604) ~[clojure-1.8.0.jar:na]
at jepsen.util$map_kv.invokeStatic(util.clj:498) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.util$map_kv.invoke(util.clj:494) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.util$map_vals.invokeStatic(util.clj:503) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.util$map_vals.invoke(util.clj:500) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.nemesis$compose$reify__4401.teardown_BANG_(nemesis.clj:193) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.nemesis$teardown_compat_BANG_.invokeStatic(nemesis.clj:50) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.nemesis$teardown_compat_BANG_.invoke(nemesis.clj:44) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.core.NemesisWorker.teardown_worker_BANG_(core.clj:452) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.core$do_worker_BANG_$fn__4574.invoke(core.clj:192) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.core$do_worker_BANG_.invokeStatic(core.clj:191) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.core$do_worker_BANG_.invoke(core.clj:162) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at jepsen.core$run_workers_BANG_$fn__4597$fn__4598.invoke(core.clj:228) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.8.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.8.0.jar:na]
at clojure.core$apply.invokeStatic(core.clj:646) ~[clojure-1.8.0.jar:na]
at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1881) ~[clojure-1.8.0.jar:na]
at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1881) ~[clojure-1.8.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.8.0.jar:na]
at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.8.0.jar:na]
at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.8.0.jar:na]
at clojure.core$apply.invokeStatic(core.clj:650) ~[clojure-1.8.0.jar:na]
at clojure.core$bound_fn_STAR_$fn__4671.doInvoke(core.clj:1911) ~[clojure-1.8.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.8.0.jar:na]
at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.8.0.jar:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_181]
I'm seeing two types of barrier exceptions coming from Jepsen. Just checking - are these both known and equally ignorable? Cause I want to make our roachtest ignore them.
138 ERROR [2018-10-22 20:25:35,556] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
139 java.lang.InterruptedException: null
140 at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:211) ~[na:1.8.0_181]
141 at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) ~[na:1.8.0_181]
142 at jepsen.generator.Synchronize.op(generator.clj:667) ~[jepsen-0.1.9-SNAPSHOT.jar:na]
ERROR [2018-10-20 03:44:39,113] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.util.concurrent.BrokenBarrierException: null
at java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:250) ~[na:1.8.0_181]
at java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:362) ~[na:1.8.0_181]
Ben said they're both the same thing.
Yeah, they're basically the same thing. When one thread is interrupted, any other threads waiting on the same barrier get BrokenBarrierExceptions. These exceptions then race up to the top-level exception logger. Usually the InterruptedException wins because it's the root cause of all the BrokenBarrierExceptions, but sometimes it doesn't.
Note for the future: the Jepsen roachtest now has specific code for ignoring these errors that should go away once this issue is fixed: https://github.com/cockroachdb/cockroach/blob/b2ce21d89d8ad3c9fd58e260e011f5ca2eed91f8/pkg/cmd/roachtest/jepsen.go#L230-L231
Closing as stale.
SHA: https://github.com/cockroachdb/cockroach/commits/18519949df8e717c31a636bb153e526f45375ca8
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=919791&tab=buildLog