cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

roachtest: jepsen-batch3/multi-register/start-stop-2 failed #37482

Closed cockroach-teamcity closed 5 years ago

cockroach-teamcity commented 5 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/c25518b4e9a723d8de0dba30a95ce0ade7963aed

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=jepsen-batch3/multi-register/start-stop-2 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1286188&tab=buildLog

The test failed on branch=master, cloud=gce:
    jepsen.go:260,jepsen.go:322,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1286188-jepsen-batch3:6 -- bash -e -c "\
        cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
         ~/lein run test \
           --tarball file://${PWD}/cockroach.tgz \
           --username ${USER} \
           --ssh-private-key ~/.ssh/id_rsa \
           --os ubuntu \
           --time-limit 300 \
           --concurrency 30 \
           --recovery-time 25 \
           --test-count 1 \
           -n 10.142.0.23 -n 10.142.0.6 -n 10.142.0.26 -n 10.142.0.29 -n 10.142.0.25 \
           --test multi-register --nemesis start-stop-2 \
        > invoke.log 2>&1 \
        " returned:
        stderr:

        stdout:
        Error:  exit status 1
        : exit status 1
nvanbenschoten commented 5 years ago
([{:op
     {:process 25,
      :type :ok,
      :f :txn,
      :value [[:write 4 8] [:write 0 8]],
      :index 75358,
      :time 278788273877},
     :model {3 7, 1 4, 0 8, 2 3, 4 8}}
    {:op
     {:process 26,
      :type :ok,
      :f :txn,
      :value [[:read 0 9]],
      :index 75361,
      :time 278792446000},
     :model {:msg "8≠9"}}]
   [{:op
     {:process 25,
      :type :ok,
      :f :txn,
      :value [[:write 4 8] [:write 0 8]],
      :index 75358,
      :time 278788273877},
     :model {3 7, 1 4, 0 8, 2 2, 4 8}}
    {:op
     {:process 26,
      :type :ok,
      :f :txn,
      :value [[:read 0 9]],
      :index 75361,
      :time 278792446000},
     :model {:msg "8≠9"}}]),
Screen Shot 2019-05-13 at 4 12 59 PM

Duplicate of https://github.com/cockroachdb/cockroach/issues/37394#issuecomment-491087404. Notice the stale read of register 0 by process 26, which returns 9 instead of 8. Just like in #37394, 8 was written as the second write in a transaction that had finished just before.