cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

jepsen: apt failures #31944

Closed cockroach-teamcity closed 3 years ago

cockroach-teamcity commented 6 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/d07351b2d1a3e9b5519aa8bc662db0ceb7b7ef48

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=jepsen-batch3/register/majority-ring PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=988969&tab=buildLog

The test failed on 31663:
    test.go:639,cluster.go:1110,jepsen.go:87,jepsen.go:127,jepsen.go:313: /home/agent/work/.go/bin/roachprod run teamcity-988969-jepsen-batch3:1-6 -- sh -c "sudo apt-get -y update > logs/apt-upgrade.log 2>&1" returned:
        stderr:

        stdout:
        teamcity-988969-jepsen-batch3: sh -c "sudo apt-get -y upda...........
           1: 
           2: 
           3: 
        exit status 100
           4: 
           5: 
           6: 
        Error:  exit status 100
        : exit status 1
    test.go:639,cluster.go:1110,jepsen.go:74,asm_amd64.s:573,panic.go:377,test.go:640,cluster.go:1110,jepsen.go:87,jepsen.go:127,jepsen.go:313: test already failed
cockroach-teamcity commented 6 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/5ef4d2c8621fc5465f73a96221b0bd0bc5cd27aa

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=jepsen-batch3/register/majority-ring PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=990073&tab=buildLog

The test failed on master:
    test.go:639,jepsen.go:243,jepsen.go:304: /home/agent/work/.go/bin/roachprod run teamcity-990073-jepsen-batch3:6 -- bash -e -c "\
        cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
         ~/lein run test \
           --tarball file://${PWD}/cockroach.tgz \
           --username ${USER} \
           --ssh-private-key ~/.ssh/id_rsa \
           --os ubuntu \
           --time-limit 300 \
           --concurrency 30 \
           --recovery-time 25 \
           --test-count 1 \
           -n 10.128.0.19 -n 10.128.0.36 -n 10.128.0.18 -n 10.128.0.24 -n 10.128.0.21 \
           --test register --nemesis majority-ring \
        > invoke.log 2>&1 \
        " returned:
        stderr:

        stdout:
        Error:  exit status 255
        : exit status 1
bdarnell commented 6 years ago

Two different apt failures. First one is a failure in apt-get update.

Get:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [869 kB]
Ign:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
Get:21 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [698 kB]
Get:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [1,123 kB]
Ign:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
Err:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
  Writing more data than expected (1123337 > 1123271) [IP: 35.184.213.5 80]
Get:31 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
Get:32 http://security.ubuntu.com/ubuntu xenial-security/main Sources [136 kB]
Get:33 http://security.ubuntu.com/ubuntu xenial-security/restricted Sources [2,116 B]
Get:34 http://security.ubuntu.com/ubuntu xenial-security/universe Sources [78.8 kB]
Get:35 http://security.ubuntu.com/ubuntu xenial-security/multiverse Sources [2,088 B]
Get:36 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [573 kB]
Get:37 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [393 kB]
Get:38 http://security.ubuntu.com/ubuntu xenial-security/universe Translation-en [151 kB]
Get:39 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [3,460 B]
Get:40 http://security.ubuntu.com/ubuntu xenial-security/multiverse Translation-en [1,744 B]
Fetched 24.5 MB in 3s (6,669 kB/s)
Reading package lists...
E: Failed to fetch http://us-central1.gce.archive.ubuntu.com/ubuntu/dists/xenial-updates/main/binary-amd64/Packages  Writing more data than expected (1123337 > 1123271) [IP: 35.184.213.5 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.

Second one is #31780: apt-get install failed, but the logs don't show any evidence of failure.

tbg commented 5 years ago

These are now "caught" by grepping them out in the jepsen nightly script. I'll leave this issue open but move it out of the test-failure label.

andreimatei commented 5 years ago

These are now "caught" by grepping them out in the jepsen nightly script.

That wasn't true, but I'm making it true in #37430

github-actions[bot] commented 3 years ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!