cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

roachtest: jepsen subcritical-skews tests a skipped due to ntp rate limiting #35599

Closed cockroach-teamcity closed 1 year ago

cockroach-teamcity commented 5 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/a119a3a158725c9e3f9b8084d9398601c0e67007

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=jepsen-batch1/bank-multitable/subcritical-skews PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1170795&tab=buildLog

The test failed on master:
    jepsen.go:247,jepsen.go:308,test.go:1214: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1170795-jepsen-batch1:6 -- bash -e -c "\
        cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
         ~/lein run test \
           --tarball file://${PWD}/cockroach.tgz \
           --username ${USER} \
           --ssh-private-key ~/.ssh/id_rsa \
           --os ubuntu \
           --time-limit 300 \
           --concurrency 30 \
           --recovery-time 25 \
           --test-count 1 \
           -n 10.142.0.38 -n 10.142.0.9 -n 10.142.0.41 -n 10.142.0.27 -n 10.142.0.26 \
           --test bank-multitable --nemesis subcritical-skews \
        > invoke.log 2>&1 \
        " returned:
        stderr:

        stdout:
        Error:  exit status 255
        : exit status 1

Jira issue: CRDB-4573

cockroach-teamcity commented 5 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/5ebfeec052f9cee4e63757defe7c9120643293db

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=jepsen-batch1/bank-multitable/subcritical-skews PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1174810&tab=buildLog

The test failed on release-2.1:
    jepsen.go:247,jepsen.go:308,test.go:1214: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1174810-jepsen-batch1:6 -- bash -e -c "\
        cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
         ~/lein run test \
           --tarball file://${PWD}/cockroach.tgz \
           --username ${USER} \
           --ssh-private-key ~/.ssh/id_rsa \
           --os ubuntu \
           --time-limit 300 \
           --concurrency 30 \
           --recovery-time 25 \
           --test-count 1 \
           -n 10.142.0.47 -n 10.142.0.38 -n 10.142.0.44 -n 10.142.0.36 -n 10.142.0.41 \
           --test bank-multitable --nemesis subcritical-skews \
        > invoke.log 2>&1 \
        " returned:
        stderr:

        stdout:
        Error:  exit status 255
        : exit status 1
cockroach-teamcity commented 5 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/7ce9188c6e64465d9dcb9f0ca0f113dd0e584da0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=jepsen-batch1/bank-multitable/subcritical-skews PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1178908&tab=buildLog

The test failed on release-2.1:
    jepsen.go:247,jepsen.go:308,test.go:1214: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1178908-jepsen-batch1:6 -- bash -e -c "\
        cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
         ~/lein run test \
           --tarball file://${PWD}/cockroach.tgz \
           --username ${USER} \
           --ssh-private-key ~/.ssh/id_rsa \
           --os ubuntu \
           --time-limit 300 \
           --concurrency 30 \
           --recovery-time 25 \
           --test-count 1 \
           -n 10.142.0.39 -n 10.142.0.159 -n 10.142.0.38 -n 10.142.0.36 -n 10.142.0.160 \
           --test bank-multitable --nemesis subcritical-skews \
        > invoke.log 2>&1 \
        " returned:
        stderr:

        stdout:
        Error:  exit status 255
        : exit status 1
bdarnell commented 5 years ago

The subcritical-skews nemesis resynchronizes with ntp frequently. This has recently started failing because we're getting rate-limited by the NTP server (it hard-codes ntp.ubuntu.com).

We need to either

cucaroach commented 2 years ago

Clearing the milestone so this gets re-triaged.

aliher1911 commented 2 years ago

While looking on other issues connected to jepsen tests I found that recent jepsen packages use pool.ntp.org instead of ntp.ubuntu.org.

I changed it and gave it a try and surprise we are not throttled by pool and I see no more complains in the log.

Since we have server address hardcoded into our tests it should be a quick win so that we could have tests reenabled.

aliher1911 commented 2 years ago

With jepsen change in place, I'll make a diff and see if it works or not. Running those tests with roachtest from dev looked fine.

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/test-eng