Closed cockroach-teamcity closed 5 years ago
This is a real failure:
INFO [2019-01-29 08:42:48,388] jepsen results - jepsen.store Wrote /mnt/data1/jepsen/cockroachdb/store/cockroachdb-bank:majring/20190129T083549.000Z/results.edn
INFO [2019-01-29 08:42:49,270] main - jepsen.core {:perf
{:latency-graph {:valid? true},
:rate-graph {:valid? true},
:valid? true},
:timeline {:valid? true},
:details
{:valid? false,
:bad-reads
[{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 452,
:time 233308728429,
:value [0 11 20 9 13],
:index 12746}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 387,
:time 233308992089,
:value [0 11 20 9 13],
:index 12748}}
...
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 431,
:time 296967625647,
:value [15 0 18 3 17],
:index 14615}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 541,
:time 296967653011,
:value [15 0 18 3 17],
:index 14616}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 500,
:time 296967702596,
:value [15 0 18 3 17],
:index 14617}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 529,
:time 296967709008,
:value [25 7 6 3 12],
:index 14618}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 670,
:time 296967733680,
:value [24 2 17 3 7],
:index 14620}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 462,
:time 296986725618,
:value [25 2 9 3 14],
:index 14632}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 542,
:time 296986915503,
:value [4 9 25 3 12],
:index 14634}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 510,
:time 296986999786,
:value [2 7 27 3 14],
:index 14635}}
{:type :wrong-total,
:expected 50,
:found 53,
:op
{:type :ok,
:f :read,
:process 547,
:time 339949928998,
:value [0 7 7 14 25],
:index 14695}}]},
:valid? false}
Analysis invalid! (ノಥ益ಥ)ノ ┻━┻
Possibly has the same cause as the TPC-C errors @nvanbenschoten is looking at (#34228?)
This is a bank test with 5 accounts (and network partitions). At some point just prior to operation 12746 one of the transfers created $3 out of thin air, and all subsequent reads report this inconsistency. The nearest transfers of $3 were at indexes 12723 and 12725, just before a nemesis activation. The client ultimately got a timeout error on these connections, but not until 12898 (after the inconsistent reads, and long after the network partition healed at 12733)
{:type :invoke, :f :transfer, :value {:from 1, :to 0, :amount 1}, :process 426, :time 226181944237, :index 12721}
{:type :ok, :f :transfer, :value {:from 2, :to 3, :amount 4}, :process 298, :time 226194131568, :index 12722}
{:type :invoke, :f :transfer, :value {:from 4, :to 0, :amount 3}, :process 298, :time 226194266856, :index 12723}
{:type :ok, :f :read, :process 395, :time 226284989087, :value [4 7 17 9 13], :index 12724}
{:type :invoke, :f :transfer, :value {:from 1, :to 4, :amount 3}, :process 395, :time 226285168556, :index 12725}
{:type :info, :f ["majring" :start], :process :nemesis, :time 226399731031, :index 12726}
{:type :info, :f ["majring" :start], :process :nemesis, :time 226501818010, :value [:isolated {"10.128.0.29" #{"10.128.0.70" "10.128.0.78"}, "10.128.0.92" #{"10.128.0.88" "10.128.0.78"}, "10.128.0.70" #{"10.128.0.29" "10.128.0.88"}, "10.128.0.78" #{"10.128.0.29" "10.128.0.92"}, "10.128.0.88" #{"10.128.0.92" "10.128.0.70"}}], :index 12727}
{:type :info, :f :transfer, :value {:from 4, :to 0, :amount 2}, :process 379, :time 228551244509, :error :timeout, :index 12728}
{:type :info, :f :transfer, :value {:from 0, :to 2, :amount 4}, :process 279, :time 228551450787, :error :timeout, :index 12729}
{:type :invoke, :f :transfer, :value {:from 3, :to 1, :amount 2}, :process 409, :time 228553132511, :index 12730}
{:type :invoke, :f :transfer, :value {:from 3, :to 0, :amount 4}, :process 309, :time 228553133655, :index 12731}
{:type :info, :f ["majring" :stop], :process :nemesis, :time 231502289665, :index 12732}
{:type :info, :f ["majring" :stop], :process :nemesis, :time 231705236185, :value :network-healed, :index 12733}
{:type :info, :f :read, :process 342, :time 232676248887, :error [:psql-exception "ERROR: communication error: rpc error: code = Canceled desc = context canceled"], :index 12734}
{:type :info, :f :read, :process 357, :time 232676303850, :error [:psql-exception "ERROR: communication error: rpc error: code = Canceled desc = context canceled"], :index 12735}
{:type :info, :f :read, :process 422, :time 232677871811, :error [:psql-exception "ERROR: communication error: rpc error: code = Canceled desc = context canceled"], :index 12736}
{:type :invoke, :f :read, :process 387, :time 232677734349, :index 12737}
{:type :invoke, :f :transfer, :value {:from 1, :to 0, :amount 2}, :process 372, :time 232677783855, :index 12738}
{:type :invoke, :f :read, :process 452, :time 232679451919, :index 12739}
{:type :info, :f :read, :process 321, :time 233248572734, :error [:psql-exception "ERROR: communication error: rpc error: code = Canceled desc = context canceled"], :index 12740}
{:type :invoke, :f :transfer, :value {:from 3, :to 4, :amount 4}, :process 351, :time 233250075018, :index 12741}
{:type :ok, :f :transfer, :value {:from 3, :to 4, :amount 4}, :process 351, :time 233267410923, :index 12742}
{:type :invoke, :f :transfer, :value {:from 4, :to 0, :amount 5}, :process 351, :time 233267588090, :index 12743}
{:type :ok, :f :transfer, :value {:from 1, :to 0, :amount 4}, :process 435, :time 233301067125, :index 12744}
{:type :invoke, :f :read, :process 435, :time 233301231876, :index 12745}
{:type :ok, :f :read, :process 452, :time 233308728429, :value [0 11 20 9 13], :index 12746}
SHA: https://github.com/cockroachdb/cockroach/commits/e10fb557b11b5ff1b8609aa963da23c37a1143c8
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1113854&tab=buildLog