Closed bdarnell closed 6 years ago
Looking back further in the logs, we see that this was a "silent for too long" failure:
[10:39:04][/majority-ring] Jepsen test was silent for too long, aborting
[10:40:11][/majority-ring] ssh: connect to host 104.196.219.45 port 22: Connection timed out
[10:40:11][/majority-ring] Test failed: exit code 142. Grabbing artifacts from controller...
[10:40:11][/majority-ring] +(teamcity-jepsen-run-one.sh) progress 'Test failed: exit code 142. Grabbing artifacts from controller...'
[10:40:11][/majority-ring] +(teamcity-jepsen-run-one.sh) printf '##%s[progressMessage '\''%s'\'']\n' teamcity 'Test failed: exit code 142. Grabbing artifacts from controller...'
We don't have complete logs available for this run, but the jepsen controller appears to have finished one minute before the "silent too long" failure.
[10:40:11][/majority-ring] INFO [2017-11-20 10:38:03,267] main - jepsen.core {:timeline {:valid? true},
[10:40:11][/majority-ring] :perf
[10:40:11][/majority-ring] {:latency-graph {:valid? true},
[10:40:11][/majority-ring] :rate-graph {:valid? true},
[10:40:11][/majority-ring] :valid? true},
[10:40:11][/majority-ring] :g2
[10:40:11][/majority-ring] {:valid? true,
[10:40:11][/majority-ring] :key-count 2833,
[10:40:11][/majority-ring] :legal-count 328,
[10:40:11][/majority-ring] :illegal-count 0,
[10:40:11][/majority-ring] :illegal {}},
[10:40:11][/majority-ring] :valid? true}
[10:40:11][/majority-ring]
[10:40:11][/majority-ring]
[10:40:11][/majority-ring] Everything looks good! ヽ(‘ー`)ノ
The ssh timeout at 10:40:11 must be this line; the following scp
to grab the logs worked. So this looks like a network fluke. Since it hasn't recurred, I think this issue can be closed.
Jepsen tests are occasionally reporting failure after an apparently successful run, with no apparent cause of failure: