cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

roachtest: tpcc/mixed-headroom/n5cpu16 failed #83079

Closed cockroach-teamcity closed 2 years ago

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 8d34ef1ea15850ee1c70470610b6652df4c317de:

          |  1766.0s        0            6.0           18.5  28991.0  34359.7  34359.7  34359.7 stockLevel
          |  1767.0s        0           11.0           18.6  38654.7  38654.7  40802.2  40802.2 delivery
          |  1767.0s        0           41.1          186.3  33286.0  40802.2  40802.2  40802.2 newOrder
          |  1767.0s        0            0.0           18.6      0.0      0.0      0.0      0.0 orderStatus
          |  1767.0s        0           14.0          186.1  32212.3  33286.0  42949.7  42949.7 payment
          |  1767.0s        0            2.0           18.5  27917.3  28991.0  28991.0  28991.0 stockLevel
          |  1768.0s        0            3.0           18.6  40802.2  40802.2  40802.2  40802.2 delivery
          |  1768.0s        0           51.9          186.2  38654.7  42949.7  45097.2  47244.6 newOrder
          |  1768.0s        0            1.0           18.6  38654.7  38654.7  38654.7  38654.7 orderStatus
          |  1768.0s        0            7.0          186.0  28991.0  40802.2  40802.2  40802.2 payment
          |  1768.0s        0            1.0           18.5  36507.2  36507.2  36507.2  36507.2 stockLevel
        Wraps: (8) COMMAND_PROBLEM
        Wraps: (9) Node 5. Command with error:
          | ``````
          | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
          | ``````
        Wraps: (10) exit status 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

    mixed_version_jobs.go:73,versionupgrade.go:178,tpcc.go:427,test_runner.go:897: monitor failure: monitor task failed: t.Fatal() was called
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:178
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:427
          | main.(*testRunner).runTest.func2
          |     main/pkg/cmd/roachtest/test_runner.go:897
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     main/pkg/cmd/roachtest/monitor.go:80
          | runtime.doInit
          |     GOROOT/src/runtime/proc.go:6498
          | runtime.main
          |     GOROOT/src/runtime/proc.go:238
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-16849

Epic CRDB-19172

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 8d34ef1ea15850ee1c70470610b6652df4c317de:

          |   664.0s        0            1.0           18.3  13421.8  13421.8  13421.8  13421.8 stockLevel
          |   665.0s        0            0.0           18.4      0.0      0.0      0.0      0.0 delivery
          |   665.0s        0            4.0          184.8  17179.9  19327.4  19327.4  19327.4 newOrder
          |   665.0s        0            0.0           18.4      0.0      0.0      0.0      0.0 orderStatus
          |   665.0s        0            0.0          182.9      0.0      0.0      0.0      0.0 payment
          |   665.0s        0            0.0           18.3      0.0      0.0      0.0      0.0 stockLevel
          |   666.0s        0            0.0           18.3      0.0      0.0      0.0      0.0 delivery
          |   666.0s        0            0.0          184.5      0.0      0.0      0.0      0.0 newOrder
          |   666.0s        0            0.0           18.4      0.0      0.0      0.0      0.0 orderStatus
          |   666.0s        0            0.0          182.6      0.0      0.0      0.0      0.0 payment
          |   666.0s        0            0.0           18.3      0.0      0.0      0.0      0.0 stockLevel
        Wraps: (8) COMMAND_PROBLEM
        Wraps: (9) Node 5. Command with error:
          | ``````
          | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
          | ``````
        Wraps: (10) exit status 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

    mixed_version_jobs.go:73,versionupgrade.go:178,tpcc.go:427,test_runner.go:897: monitor failure: monitor task failed: t.Fatal() was called
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:178
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:427
          | main.(*testRunner).runTest.func2
          |     main/pkg/cmd/roachtest/test_runner.go:897
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     main/pkg/cmd/roachtest/monitor.go:80
          | runtime.doInit
          |     GOROOT/src/runtime/proc.go:6498
          | runtime.main
          |     GOROOT/src/runtime/proc.go:238
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

lidorcarmel commented 2 years ago

node 3 OOMed (node 2 on the second failure):

[ 7780.081918] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cockroach.service,task=cockroach,pid=2867347,uid=1000
[ 7780.082013] Out of memory: Killed process 2867347 (cockroach) total-vm:21357384kB, anon-rss:13339668kB, file-rss:1236kB, shmem-rss:0kB, UID:1000 pgtables:38656kB oom_score_adj:0
[ 7780.734170] oom_reaper: reaped process 2867347 (cockroach), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 13cb2f6c40e3146fed8d931f65f89da9b42ce2c1:

          |    17.0s        0            0.0            8.2      0.0      0.0      0.0      0.0 stockLevel
          |    18.0s        0            3.0            6.9   8053.1  10737.4  10737.4  10737.4 delivery
          |    18.0s        0           82.1           65.6  10737.4  13958.6  15569.3  18253.6 newOrder
          |    18.0s        0            5.0            8.0   6442.5   7247.8   7247.8   7247.8 orderStatus
          |    18.0s        0           48.1           74.9  10737.4  11811.2  12884.9  12884.9 payment
          |    18.0s        0            0.0            7.8      0.0      0.0      0.0      0.0 stockLevel
          |    19.0s        0            5.0            6.8  10200.5  10200.5  10200.5  10200.5 delivery
          |    19.0s        0           22.0           63.3  11811.2  14495.5  15032.4  15032.4 newOrder
          |    19.0s        0            0.0            7.6      0.0      0.0      0.0      0.0 orderStatus
          |    19.0s        0           52.0           73.7  10737.4  11811.2  11811.2  12348.0 payment
          |    19.0s        0            0.0            7.4      0.0      0.0      0.0      0.0 stockLevel
        Wraps: (8) COMMAND_PROBLEM
        Wraps: (9) Node 5. Command with error:
          | ``````
          | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
          | ``````
        Wraps: (10) exit status 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

    mixed_version_jobs.go:73,versionupgrade.go:178,tpcc.go:427,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:178
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:427
          | main.(*testRunner).runTest.func2
          |     main/pkg/cmd/roachtest/test_runner.go:896
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     main/pkg/cmd/roachtest/monitor.go:80
          | runtime.doInit
          |     GOROOT/src/runtime/proc.go:6498
          | runtime.main
          |     GOROOT/src/runtime/proc.go:238
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 457d724622e4fa2e62d6f4e7926509dbc7d18511:

          |   785.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 stockLevel
          |   786.0s        0            0.0           18.8      0.0      0.0      0.0      0.0 delivery
          |   786.0s        0            0.0          189.1      0.0      0.0      0.0      0.0 newOrder
          |   786.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 orderStatus
          |   786.0s        0            0.0          188.5      0.0      0.0      0.0      0.0 payment
          |   786.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 stockLevel
          |   787.0s        0            0.0           18.8      0.0      0.0      0.0      0.0 delivery
          |   787.0s        0            0.0          188.8      0.0      0.0      0.0      0.0 newOrder
          |   787.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 orderStatus
          |   787.0s        0            0.0          188.2      0.0      0.0      0.0      0.0 payment
          |   787.0s        0            0.0           18.9      0.0      0.0      0.0      0.0 stockLevel
        Wraps: (8) COMMAND_PROBLEM
        Wraps: (9) Node 5. Command with error:
          | ``````
          | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          | ``````
        Wraps: (10) exit status 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

    mixed_version_jobs.go:73,versionupgrade.go:188,tpcc.go:433,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:188
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:433
          | main.(*testRunner).runTest.func2
          |     main/pkg/cmd/roachtest/test_runner.go:896
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     main/pkg/cmd/roachtest/monitor.go:80
          | runtime.doInit
          |     GOROOT/src/runtime/proc.go:6498
          | runtime.main
          |     GOROOT/src/runtime/proc.go:238
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 773f7d4445ce3e0e806b7a182adba70a0f270f19:

          |   298.0s        0          168.0           93.8     31.5     44.0     48.2     60.8 newOrder
          |   298.0s        0           13.0           10.1      6.0      9.4     10.5     10.5 orderStatus
          |   298.0s        0          201.0          100.4     18.9     32.5     50.3     56.6 payment
          |   298.0s        0           15.0           10.0     27.3     65.0     92.3     92.3 stockLevel
          |   299.0s        0           12.0           10.0     58.7     62.9     62.9     62.9 delivery
          |   299.0s        0          203.8           94.2     33.6     65.0     79.7     83.9 newOrder
          |   299.0s        0           27.0           10.2      6.8      8.1     13.6     13.6 orderStatus
          |   299.0s        0          214.8          100.8     21.0     50.3     67.1     75.5 payment
          |   299.0s        0           24.0           10.0     33.6     62.9     88.1     88.1 stockLevel
          |   300.0s        0           18.0           10.0     60.8     92.3    100.7    100.7 delivery
          |   300.0s        0          195.1           94.5     32.5     41.9     50.3     52.4 newOrder
          |   300.0s        0           20.0           10.2      6.0      7.1      7.1      7.1 orderStatus
          |   300.0s        0          174.1          101.0     21.0     32.5     39.8     46.1 payment
          |   300.0s        0           15.0           10.1     26.2     37.7     50.3     50.3 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |     1.0s        0           20.1           20.1     60.8     79.7    109.1    109.1 delivery
          |     1.0s        0          211.3          211.4     35.7     52.4     62.9     67.1 newOrder
          |     1.0s        0           26.2           26.2      6.6      8.9      8.9      8.9 orderStatus
          |     1.0s        0          166.0          166.1     21.0     33.6     41.9     48.2 payment
          |     1.0s        0           15.1           15.1     26.2     44.0     46.1     46.1 stockLevel
          |     2.0s        0           20.0           20.1     58.7     67.1     79.7     79.7 delivery
          |     2.0s        0          166.0          188.6     31.5     44.0     46.1     54.5 newOrder
          |     2.0s        0           14.0           20.1      6.0      7.6      8.9      8.9 orderStatus
          |     2.0s        0          214.0          190.1     19.9     28.3     33.6     33.6 payment
          |     2.0s        0           17.0           16.1     32.5     48.2     71.3     71.3 stockLevel
          |     3.0s        0           23.0           21.0     58.7     83.9     83.9     83.9 delivery
          |     3.0s        0          175.1          184.1     32.5     39.8     46.1     46.1 newOrder
          |     3.0s        0           12.0           17.4      6.3      7.6      8.1      8.1 orderStatus
          |     3.0s        0          214.1          198.1     21.0     29.4     46.1     58.7 payment
          |     3.0s        0           19.0           17.0     31.5     48.2     54.5     54.5 stockLevel
          |     4.0s        0           14.0           19.3     60.8     88.1     92.3     92.3 delivery
          |     4.0s        0          220.8          193.3     33.6     52.4     65.0     83.9 newOrder
          |     4.0s        0           20.0           18.0      6.6      8.9     10.5     10.5 orderStatus
          |     4.0s        0          168.9          190.8     22.0     39.8     50.3     62.9 payment
          |     4.0s        0           22.0           18.3     33.6     46.1     62.9     62.9 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |     5.0s        0           15.0           18.4     79.7     96.5    117.4    117.4 delivery
          |     5.0s        0          196.1          193.9     35.7     52.4     71.3     83.9 newOrder
          |     5.0s        0           11.0           16.6      7.6     11.0     11.5     11.5 orderStatus
          |     5.0s        0          164.1          185.4     23.1     46.1     65.0     65.0 payment
          |     5.0s        0           10.0           16.6     41.9     50.3     50.3     50.3 stockLevel
        Wraps: (8) COMMAND_PROBLEM
        Wraps: (9) Node 5. Command with error:
          | ``````
          | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          | ``````
        Wraps: (10) exit status 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

    versionupgrade.go:502,versionupgrade.go:188,tpcc.go:433,test_runner.go:896: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ f4042d47fa8062a612c38d4696eb6bee9cee7c21:

          |   255.0s        0          150.9           87.0     19.9     31.5     41.9     46.1 payment
          |   255.0s        0           19.0            8.8     25.2     48.2     52.4     52.4 stockLevel
          |   256.0s        0           19.0            8.6     65.0     83.9     83.9     83.9 delivery
          |   256.0s        0          196.1           80.1     35.7     46.1     50.3     56.6 newOrder
          |   256.0s        0           18.0            8.9      7.1     11.0     15.7     15.7 orderStatus
          |   256.0s        0          158.1           87.3     21.0     31.5     35.7     41.9 payment
          |   256.0s        0           20.0            8.8     26.2     41.9     56.6     56.6 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   257.0s        0           21.0            8.6     56.6     65.0     67.1     67.1 delivery
          |   257.0s        0          159.0           80.4     33.6     44.0     44.0     48.2 newOrder
          |   257.0s        0           17.0            8.9      7.6      8.9      9.4      9.4 orderStatus
          |   257.0s        0          177.0           87.7     22.0     32.5     35.7     35.7 payment
          |   257.0s        0           14.0            8.9     26.2     41.9     46.1     46.1 stockLevel
          |   258.0s        0           15.0            8.6     58.7     65.0     75.5     75.5 delivery
          |   258.0s        0          159.0           80.7     33.6     44.0     50.3     56.6 newOrder
          |   258.0s        0           18.0            8.9      6.8      8.9      9.4      9.4 orderStatus
          |   258.0s        0          163.0           88.0     19.9     28.3     33.6     35.7 payment
          |   258.0s        0           17.0            8.9     25.2     46.1     46.1     46.1 stockLevel
          |   259.0s        0           14.0            8.7     58.7     65.0     71.3     71.3 delivery
          |   259.0s        0          154.8           81.0     33.6     41.9     46.1     54.5 newOrder
          |   259.0s        0           18.0            9.0      6.0     10.5     13.1     13.1 orderStatus
          |   259.0s        0          156.8           88.2     21.0     28.3     30.4     44.0 payment
          |   259.0s        0           15.0            8.9     33.6     52.4     62.9     62.9 stockLevel
          |   260.0s        0           15.0            8.7     62.9     79.7    151.0    151.0 delivery
          |   260.0s        0          163.1           81.3     35.7     44.0     48.2     52.4 newOrder
          |   260.0s        0           15.0            9.0      6.6      8.9     10.0     10.0 orderStatus
          |   260.0s        0          164.1           88.5     21.0     29.4     33.6     35.7 payment
          |   260.0s        0           21.0            9.0     31.5     48.2     60.8     60.8 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   261.0s        0           12.0            8.7     58.7     65.0     71.3     71.3 delivery
          |   261.0s        0          165.9           81.6     35.7     44.0     54.5     58.7 newOrder
          |   261.0s        0           20.0            9.0      7.3      8.1     12.1     12.1 orderStatus
          |   261.0s        0          184.9           88.9     22.0     30.4     37.7     50.3 payment
          |   261.0s        0           30.0            9.0     26.2     46.1     48.2     48.2 stockLevel
          |   262.0s        0           22.0            8.8     58.7     79.7     83.9     83.9 delivery
          |   262.0s        0          159.1           81.9     35.7     48.2     54.5     54.5 newOrder
          |   262.0s        0           19.0            9.1      6.8      8.1      8.4      8.4 orderStatus
          |   262.0s        0          162.1           89.2     21.0     27.3     29.4     35.7 payment
          |   262.0s        0           13.0            9.1     24.1     39.8     48.2     48.2 stockLevel
        Wraps: (8) secondary error attachment
          | UNCLASSIFIED_PROBLEM: context canceled
          | (1) UNCLASSIFIED_PROBLEM
          | Wraps: (2) Node 5. Command with error:
          |   | ``````
          |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          |   | ``````
          | Wraps: (3) context canceled
          | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
        Wraps: (9) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ a0d8839aa6164af81a9ebb140147d3baf5321287:

          |    73.0s        0            4.0            3.2      8.9     10.0     10.0     10.0 orderStatus
          |    73.0s        0           54.0           31.2     18.9     50.3     52.4     52.4 payment
          |    73.0s        0            5.0            3.2     33.6     48.2     48.2     48.2 stockLevel
          |    74.0s        0            4.0            3.5     71.3     88.1     88.1     88.1 delivery
          |    74.0s        0           40.0           20.2     32.5     52.4     60.8     60.8 newOrder
          |    74.0s        0            5.0            3.2      8.9     10.5     10.5     10.5 orderStatus
          |    74.0s        0           48.0           31.4     16.3     30.4     37.7     37.7 payment
          |    74.0s        0            5.0            3.2     37.7     41.9     41.9     41.9 stockLevel
          |    75.0s        0            5.0            3.5     96.5    167.8    167.8    167.8 delivery
          |    75.0s        0           46.0           20.6     48.2     92.3    109.1    109.1 newOrder
          |    75.0s        0            4.0            3.2      9.4     10.5     10.5     10.5 orderStatus
          |    75.0s        0           58.0           31.7     24.1     60.8     67.1     67.1 payment
          |    75.0s        0            8.0            3.3     33.6     79.7     79.7     79.7 stockLevel
          |    76.0s        0            5.0            3.5     83.9    121.6    121.6    121.6 delivery
          |    76.0s        0           39.0           20.8     52.4     75.5     92.3     92.3 newOrder
          |    76.0s        0            7.0            3.2      7.9     24.1     24.1     24.1 orderStatus
          |    76.0s        0           61.0           32.1     18.9     60.8     62.9     62.9 payment
          |    76.0s        0            5.0            3.3     33.6     41.9     41.9     41.9 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |    77.0s        0            5.0            3.5     71.3    104.9    104.9    104.9 delivery
          |    77.0s        0           46.0           21.1     79.7    134.2    167.8    167.8 newOrder
          |    77.0s        0            2.0            3.2      7.6      9.4      9.4      9.4 orderStatus
          |    77.0s        0           55.0           32.4     37.7     75.5     88.1    109.1 payment
          |    77.0s        0            7.0            3.4     32.5     54.5     54.5     54.5 stockLevel
          |    78.0s        0            4.0            3.6     71.3    134.2    134.2    134.2 delivery
          |    78.0s        0           45.0           21.4     75.5    117.4    121.6    121.6 newOrder
          |    78.0s        0            9.0            3.3      7.6     11.0     11.0     11.0 orderStatus
          |    78.0s        0           50.0           32.7     25.2     79.7    104.9    104.9 payment
          |    78.0s        0            3.0            3.4     30.4     54.5     54.5     54.5 stockLevel
          |    79.0s        0            8.0            3.6    117.4    121.6    121.6    121.6 delivery
          |    79.0s        0           45.0           21.7     75.5    121.6    159.4    159.4 newOrder
          |    79.0s        0            5.0            3.3      8.9     11.0     11.0     11.0 orderStatus
          |    79.0s        0           44.0           32.8     21.0     62.9     88.1     88.1 payment
          |    79.0s        0           10.0            3.5     27.3     35.7     35.7     35.7 stockLevel
          |    80.0s        0            6.0            3.6    104.9    117.4    117.4    117.4 delivery
          |    80.0s        0           55.0           22.2     67.1    121.6    130.0    159.4 newOrder
          |    80.0s        0            4.0            3.3      7.3     12.1     12.1     12.1 orderStatus
          |    80.0s        0           64.0           33.2     18.9     79.7    113.2    113.2 payment
          |    80.0s        0            5.0            3.5     44.0     58.7     58.7     58.7 stockLevel
        Wraps: (8) secondary error attachment
          | UNCLASSIFIED_PROBLEM: context canceled
          | (1) UNCLASSIFIED_PROBLEM
          | Wraps: (2) Node 5. Command with error:
          |   | ``````
          |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          |   | ``````
          | Wraps: (3) context canceled
          | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
        Wraps: (9) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ aaf50e920ceff3c2863ab96b9e3614b8434b70a8:

          |   285.0s        0           12.0            9.7      7.3      7.9     16.8     16.8 orderStatus
          |   285.0s        0          181.8           96.5     19.9     41.9     48.2     65.0 payment
          |   285.0s        0           25.0            9.6     24.1     48.2     60.8     60.8 stockLevel
          |   286.0s        0           11.0            9.6     60.8     79.7     92.3     92.3 delivery
          |   286.0s        0          196.2           90.0     32.5     48.2     56.6     67.1 newOrder
          |   286.0s        0           25.0            9.7      7.1      9.4      9.4      9.4 orderStatus
          |   286.0s        0          192.2           96.8     19.9     30.4     41.9     52.4 payment
          |   286.0s        0           24.0            9.7     24.1     50.3     54.5     54.5 stockLevel
          |   287.0s        0           20.0            9.6     58.7     71.3     92.3     92.3 delivery
          |   287.0s        0          172.9           90.3     32.5     52.4     58.7     92.3 newOrder
          |   287.0s        0           18.0            9.7      6.3      7.9      8.4      8.4 orderStatus
          |   287.0s        0          171.9           97.1     18.9     27.3     37.7     39.8 payment
          |   287.0s        0           21.0            9.7     24.1     54.5     58.7     58.7 stockLevel
          |   288.0s        0           14.0            9.6     56.6     71.3     75.5     75.5 delivery
          |   288.0s        0          193.0           90.7     32.5     46.1     56.6     60.8 newOrder
          |   288.0s        0           18.0            9.8      6.8      8.9      9.4      9.4 orderStatus
          |   288.0s        0          193.0           97.4     18.9     26.2     28.3     30.4 payment
          |   288.0s        0           18.0            9.7     25.2     44.0     44.0     44.0 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   289.0s        0           16.0            9.6     65.0     71.3     71.3     71.3 delivery
          |   289.0s        0          173.0           91.0     31.5     41.9     50.3     79.7 newOrder
          |   289.0s        0           17.0            9.8      7.6      9.4     10.5     10.5 orderStatus
          |   289.0s        0          213.0           97.8     19.9     31.5     44.0     50.3 payment
          |   289.0s        0           22.0            9.8     25.2     48.2     48.2     48.2 stockLevel
          |   290.0s        0           18.0            9.7     60.8     71.3     75.5     75.5 delivery
          |   290.0s        0          183.0           91.3     32.5     50.3     60.8     65.0 newOrder
          |   290.0s        0           20.0            9.8      6.3      8.1      8.1      8.1 orderStatus
          |   290.0s        0          195.0           98.1     19.9     32.5     46.1     52.4 payment
          |   290.0s        0           10.0            9.8     27.3     44.0     44.0     44.0 stockLevel
          |   291.0s        0           20.0            9.7     60.8     83.9     96.5     96.5 delivery
          |   291.0s        0          172.0           91.6     31.5     46.1     54.5     58.7 newOrder
          |   291.0s        0           22.0            9.9      7.3     12.1     12.6     12.6 orderStatus
          |   291.0s        0          173.0           98.4     19.9     28.3     37.7     41.9 payment
          |   291.0s        0           17.0            9.8     21.0     56.6     60.8     60.8 stockLevel
          |   292.0s        0           20.0            9.7     60.8     75.5    109.1    109.1 delivery
          |   292.0s        0          186.9           91.9     31.5     44.0     50.3     50.3 newOrder
          |   292.0s        0           14.0            9.9      6.3      8.4     12.1     12.1 orderStatus
          |   292.0s        0          188.9           98.7     19.9     29.4     33.6     37.7 payment
          |   292.0s        0           23.0            9.8     22.0     48.2     48.2     48.2 stockLevel
        Wraps: (8) secondary error attachment
          | UNCLASSIFIED_PROBLEM: context canceled
          | (1) UNCLASSIFIED_PROBLEM
          | Wraps: (2) Node 5. Command with error:
          |   | ``````
          |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          |   | ``````
          | Wraps: (3) context canceled
          | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
        Wraps: (9) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 80c274877a917580af62be6eb0cd48c8c7ae9c08:

          |    94.0s        0          169.8          184.4     16.8     23.1     26.2     27.3 payment
          |    94.0s        0           15.0           18.3     18.9     35.7     37.7     37.7 stockLevel
          |    95.0s        0           16.0           18.8     56.6     62.9     65.0     65.0 delivery
          |    95.0s        0          202.0          194.8     28.3     44.0     48.2     56.6 newOrder
          |    95.0s        0           32.0           18.6      6.8      9.4     10.0     10.0 orderStatus
          |    95.0s        0          214.0          184.7     16.3     24.1     32.5     35.7 payment
          |    95.0s        0           22.0           18.4     19.9     35.7     37.7     37.7 stockLevel
          |    96.0s        0           31.0           18.9     54.5    201.3    209.7    209.7 delivery
          |    96.0s        0          200.2          194.9     32.5    151.0    184.5    192.9 newOrder
          |    96.0s        0           21.0           18.6      6.6     10.5     10.5     10.5 orderStatus
          |    96.0s        0          186.2          184.8     18.9    104.9    130.0    176.2 payment
          |    96.0s        0           20.0           18.4     15.7     41.9     46.1     46.1 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |    97.0s        0           25.0           19.0     52.4     62.9     83.9     83.9 delivery
          |    97.0s        0          214.0          195.1     29.4     39.8     44.0     48.2 newOrder
          |    97.0s        0           15.0           18.6      6.6      7.9      8.9      8.9 orderStatus
          |    97.0s        0          190.0          184.8     17.8     26.2     28.3     31.5 payment
          |    97.0s        0           22.0           18.4     18.9     44.0     54.5     54.5 stockLevel
          |    98.0s        0           18.0           19.0     52.4     56.6     58.7     58.7 delivery
          |    98.0s        0          190.9          195.0     29.4     37.7     44.0     46.1 newOrder
          |    98.0s        0           13.0           18.5      6.0      7.9      7.9      7.9 orderStatus
          |    98.0s        0          192.9          184.9     16.3     23.1     25.2     27.3 payment
          |    98.0s        0           17.0           18.4     23.1     37.7     41.9     41.9 stockLevel
          |    99.0s        0           17.0           19.0     54.5     56.6     96.5     96.5 delivery
          |    99.0s        0          187.1          194.9     27.3     37.7     50.3     52.4 newOrder
          |    99.0s        0           30.0           18.7      6.3      9.4     10.0     10.0 orderStatus
          |    99.0s        0          205.1          185.1     16.3     23.1     30.4     35.7 payment
          |    99.0s        0           13.0           18.4     19.9     37.7     44.0     44.0 stockLevel
          |   100.0s        0           32.0           19.1     54.5     65.0     67.1     67.1 delivery
          |   100.0s        0          196.0          195.0     29.4     44.0     58.7     58.7 newOrder
          |   100.0s        0           16.0           18.6      5.5     10.0     12.1     12.1 orderStatus
          |   100.0s        0          184.0          185.1     17.8     24.1     39.8     44.0 payment
          |   100.0s        0           19.0           18.4     17.8     27.3     37.7     37.7 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   101.0s        0           21.0           19.1     54.5     71.3     79.7     79.7 delivery
          |   101.0s        0          183.0          194.8     29.4     44.0     52.4     56.6 newOrder
          |   101.0s        0           21.0           18.7      6.3      8.1      8.4      8.4 orderStatus
          |   101.0s        0          174.0          185.0     17.8     29.4     35.7     39.8 payment
          |   101.0s        0           15.0           18.3     16.8     22.0     39.8     39.8 stockLevel
        Wraps: (8) secondary error attachment
          | UNCLASSIFIED_PROBLEM: context canceled
          | (1) UNCLASSIFIED_PROBLEM
          | Wraps: (2) Node 5. Command with error:
          |   | ``````
          |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          |   | ``````
          | Wraps: (3) context canceled
          | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
        Wraps: (9) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 524fd14da3fefcd849f44a835cc5f88f5dbdadcc:

          |   286.0s        0          184.0           97.3     21.0     35.7     39.8     48.2 payment
          |   286.0s        0           20.0            9.5     29.4     52.4     65.0     65.0 stockLevel
          |   287.0s        0           13.0            9.6     60.8     62.9     65.0     65.0 delivery
          |   287.0s        0          179.0           89.9     32.5     44.0     50.3     58.7 newOrder
          |   287.0s        0           16.0            9.6      7.6      8.4     10.0     10.0 orderStatus
          |   287.0s        0          183.0           97.6     19.9     30.4     37.7     41.9 payment
          |   287.0s        0           15.0            9.6     26.2     41.9     52.4     52.4 stockLevel
          |   288.0s        0            9.0            9.5     67.1     79.7     79.7     79.7 delivery
          |   288.0s        0          174.0           90.2     35.7     60.8     75.5     83.9 newOrder
          |   288.0s        0           18.0            9.7      6.3      8.9     39.8     39.8 orderStatus
          |   288.0s        0          183.0           97.9     21.0     41.9     60.8     67.1 payment
          |   288.0s        0           24.0            9.6     24.1     41.9     92.3     92.3 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   289.0s        0           26.0            9.6     60.8     88.1    100.7    100.7 delivery
          |   289.0s        0          170.0           90.5     32.5     44.0     56.6     62.9 newOrder
          |   289.0s        0           19.0            9.7      6.6      9.4     10.0     10.0 orderStatus
          |   289.0s        0          186.0           98.2     21.0     30.4     37.7     39.8 payment
          |   289.0s        0           25.0            9.7     24.1     50.3     75.5     75.5 stockLevel
          |   290.0s        0           22.0            9.6     60.8     67.1     75.5     75.5 delivery
          |   290.0s        0          186.0           90.8     32.5     52.4     58.7     60.8 newOrder
          |   290.0s        0           12.0            9.7      6.0      8.9     11.0     11.0 orderStatus
          |   290.0s        0          191.0           98.5     19.9     28.3     39.8     41.9 payment
          |   290.0s        0           26.0            9.7     30.4     48.2     79.7     79.7 stockLevel
          |   291.0s        0           21.0            9.7     62.9     92.3     96.5     96.5 delivery
          |   291.0s        0          214.8           91.2     35.7     50.3     71.3     75.5 newOrder
          |   291.0s        0           22.0            9.7      6.6     10.0     10.5     10.5 orderStatus
          |   291.0s        0          172.8           98.8     22.0     32.5     44.0     60.8 payment
          |   291.0s        0           20.0            9.8     27.3     48.2     52.4     52.4 stockLevel
          |   292.0s        0           16.0            9.7     60.8     88.1     92.3     92.3 delivery
          |   292.0s        0          189.0           91.6     31.5     44.0     58.7     60.8 newOrder
          |   292.0s        0           17.0            9.8      6.8     10.5     16.3     16.3 orderStatus
          |   292.0s        0          158.0           99.0     19.9     35.7     41.9     46.1 payment
          |   292.0s        0           16.0            9.8     27.3     50.3     54.5     54.5 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   293.0s        0           27.0            9.8     60.8     88.1     96.5     96.5 delivery
          |   293.0s        0          193.1           91.9     33.6     50.3     71.3    113.2 newOrder
          |   293.0s        0           23.0            9.8      6.3     10.5     13.6     13.6 orderStatus
          |   293.0s        0          198.1           99.3     22.0     39.8     56.6     65.0 payment
          |   293.0s        0           18.0            9.8     23.1     56.6     65.0     65.0 stockLevel
        Wraps: (8) secondary error attachment
          | UNCLASSIFIED_PROBLEM: context canceled
          | (1) UNCLASSIFIED_PROBLEM
          | Wraps: (2) Node 5. Command with error:
          |   | ``````
          |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          |   | ``````
          | Wraps: (3) context canceled
          | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
        Wraps: (9) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

nvanbenschoten commented 2 years ago

Artifacts are missing from all but the last one. In the last failure, we see node 1 exit with status code 1 ("unspecified failure"). I can't find anything else in the logs about why the process exited. It doesn't appear to be OOM related, but maybe I'm missing the signs. The last thing in the log is:

I220825 15:35:46.188812 48480 upgrade/upgradecluster/cluster.go:118 ⋮ [n1,intExec=‹×›,migration-mgr] 826 executing bump-cluster-version=22.1-48 on nodes n{1,2,3,4}

I'll try to reproduce using:

GCE_PROJECT=andrei-jepsen ./pkg/cmd/roachtest/roachstress.sh -c10 -u 'tpcc/mixed-headroom/n5cpu16' -- --cpu-quota=1280
nvanbenschoten commented 2 years ago

5 of those 10 runs failed, so this is reproducible. At least two failed due to an OOM.

nvanbenschoten commented 2 years ago

The OOM occurred during the bank import step of the roachtest. At that time, the node which OOMed was seeing many slow raft ready iterations and appears to have been overloaded.

However, the last heap profile doesn't show anything particularly interesting:

(pprof) top
Showing nodes accounting for 779.35MB, 90.53% of 860.82MB total
Dropped 497 nodes (cum <= 4.30MB)
Showing top 10 nodes out of 140
      flat  flat%   sum%        cum   cum%
  173.50MB 20.16% 20.16%   173.50MB 20.16%  github.com/cockroachdb/cockroach/pkg/col/coldata.(*element).setNonInlined
  142.38MB 16.54% 36.70%   142.38MB 16.54%  go.etcd.io/etcd/raft/v3/raftpb.(*Entry).Unmarshal
  137.63MB 15.99% 52.68%   137.63MB 15.99%  github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb.(*ReplicatedEvalResult_AddSSTable).Unmarshal
  128.14MB 14.89% 67.57%   128.14MB 14.89%  github.com/cockroachdb/cockroach/pkg/kv/bulk.(*kvBuf).fits
   97.50MB 11.33% 78.90%    97.50MB 11.33%  github.com/cockroachdb/cockroach/pkg/roachpb.(*Value).ensureRawBytes
erikgrinaker commented 2 years ago

Could be #73376, which keeps popping up. Unfortunately we may not get around to addressing it for 23.1, but we're considering bumping the priority.

nvanbenschoten commented 2 years ago

I was thinking along the same lines, but I also notice a clear inflection point in the rate of failures here, so something regressed about 17 days ago. I'm going to see if a bisect will lead to greater clarity.

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ e39111b2e714375faa0facc05e51e8f619a55b21:

          |   283.0s        0          186.1           96.4     13.6     25.2     32.5     41.9 payment
          |   283.0s        0           16.0            9.5     18.9     26.2     29.4     29.4 stockLevel
          |   284.0s        0           14.0            9.5     50.3     62.9     65.0     65.0 delivery
          |   284.0s        0          164.8           89.5     24.1     30.4     33.6     39.8 newOrder
          |   284.0s        0           13.0            9.6      7.1      8.9     10.0     10.0 orderStatus
          |   284.0s        0          182.8           96.7     13.1     16.3     22.0     24.1 payment
          |   284.0s        0           14.0            9.5     17.8     23.1     28.3     28.3 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   285.0s        0           22.0            9.6     52.4     67.1     67.1     67.1 delivery
          |   285.0s        0          189.1           89.8     24.1     31.5     41.9     71.3 newOrder
          |   285.0s        0           16.0            9.6      6.3      7.9      8.9      8.9 orderStatus
          |   285.0s        0          193.1           97.0     13.6     21.0     41.9     62.9 payment
          |   285.0s        0           19.0            9.6     17.8     22.0     25.2     25.2 stockLevel
          |   286.0s        0           16.0            9.6     54.5     83.9     88.1     88.1 delivery
          |   286.0s        0          170.1           90.1     25.2     52.4     60.8     67.1 newOrder
          |   286.0s        0           14.0            9.6      6.6      8.1     14.2     14.2 orderStatus
          |   286.0s        0          193.1           97.3     13.6     29.4     44.0     58.7 payment
          |   286.0s        0           18.0            9.6     15.7     23.1     25.2     25.2 stockLevel
          |   287.0s        0           11.0            9.6     54.5     62.9     65.0     65.0 delivery
          |   287.0s        0          192.0           90.5     24.1     28.3     33.6     37.7 newOrder
          |   287.0s        0           19.0            9.7      6.8      8.1      8.4      8.4 orderStatus
          |   287.0s        0          176.0           97.6     13.1     15.7     21.0     31.5 payment
          |   287.0s        0           15.0            9.6     16.8     23.1     26.2     26.2 stockLevel
          |   288.0s        0           20.0            9.7     54.5     67.1     75.5     75.5 delivery
          |   288.0s        0          181.1           90.8     24.1     30.4     33.6     37.7 newOrder
          |   288.0s        0           19.0            9.7      6.8      8.9     11.0     11.0 orderStatus
          |   288.0s        0          176.1           97.9     13.6     17.8     22.0     24.1 payment
          |   288.0s        0           25.0            9.7     18.9     24.1     28.3     28.3 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   289.0s        0           18.0            9.7     56.6     67.1     71.3     71.3 delivery
          |   289.0s        0          173.0           91.1     25.2     37.7     58.7     65.0 newOrder
          |   289.0s        0           17.0            9.7      6.0      8.9     13.1     13.1 orderStatus
          |   289.0s        0          189.0           98.2     14.2     32.5     46.1     54.5 payment
          |   289.0s        0            7.0            9.7     21.0     24.1     24.1     24.1 stockLevel
          |   290.0s        0           21.0            9.7     56.6     71.3     75.5     75.5 delivery
          |   290.0s        0          210.9           91.5     27.3     46.1     52.4     54.5 newOrder
          |   290.0s        0           10.0            9.7      6.6      9.4      9.4      9.4 orderStatus
          |   290.0s        0          207.9           98.6     15.2     35.7     44.0     52.4 payment
          |   290.0s        0           19.0            9.7     19.9     27.3     27.3     27.3 stockLevel
        Wraps: (8) secondary error attachment
          | UNCLASSIFIED_PROBLEM: context canceled
          | (1) UNCLASSIFIED_PROBLEM
          | Wraps: (2) Node 5. Command with error:
          |   | ``````
          |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          |   | ``````
          | Wraps: (3) context canceled
          | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
        Wraps: (9) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-bulkio branch-release-21.2]

This test on roachdash | Improve this report!

nvanbenschoten commented 2 years ago

This has not failed with the original failure mode. However, it failed at the same time as a number of other mixed versions tests 2 days ago. Moving that investigation to Test Eng.

srosenberg commented 2 years ago

The most recent failure seems unrelated to the other mixed versions failures, namely version/mixed/nodes=3 and version/mixed/nodes=5. (Both failed because of the recent change requiring COCKROACH_UPGRADE_TO_DEV_VERSION [1].) Also, this failure doesn't indicate any issue with the upgrade FSM. It appears to be a transient (network) error which causes the background (tpcc) workload to fail thereby failing the test. Thus, I am removing the xxx-blocker labels. Full analysis is below.

[1] https://github.com/cockroachdb/cockroach/issues/87687#issuecomment-1243866806

srosenberg commented 2 years ago

From teardown.log, we can see that the background tpcc workload fails after ~5 minutes,

I220908 17:54:41.085738 1 workload/cli/run.go:427  [-] 1  creating load generator...
I220908 17:54:41.282881 1 workload/cli/run.go:458  [-] 2  creating load generator... done (took 197.141856ms)
I220908 17:59:31.796588 23519 workload/pgx_helpers.go:79  [-] 4  pgx logger [error]: Exec logParams=map[args:[] err:read tcp 10.142.0.10:54240 -> 10.142.0.41:26257: read: connection reset by peer pid:3623803 sql:begin time:143.851154ms]

Note, 10.142.0.41 maps to n3. Both, n1 and n3 appear to experience transient network availability issues,

for i in `seq 1 4`; do echo "n${i}"; grep "failed to connect to n" logs/$i.unredacted/cockroach.log |tail -1;done
n1
I220908 17:54:10.201923 16855 kv/kvserver/closedts/sidetransport/sender.go:795 ⋮ [n1,ctstream=4] 507  side-transport failed to connect to n4: failed to connect to n4 at ‹10.142.0.21:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.142.0.21:26257: connect: connection refused"›
n2
W220908 17:59:33.250317 669937 2@rpc/nodedialer/nodedialer.go:192 ⋮ [n2] 787  unable to connect to n1: failed to connect to n1 at ‹10.142.0.33:26257›: ‹initial connection heartbeat failed›: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.142.0.33:26257: connect: connection refused"›
n3
I220908 17:54:10.196929 13357 kv/kvserver/closedts/sidetransport/sender.go:795 ⋮ [n3,ctstream=4] 577  side-transport failed to connect to n4: unable to dial n4: ‹breaker open›
n4
I220908 17:59:33.521021 16084 kv/kvserver/closedts/sidetransport/sender.go:795 ⋮ [n4,ctstream=1] 199  side-transport failed to connect to n1: unable to dial n1: ‹breaker open›

At the time of the workload failure (17:59:31), all the nodes are in the mixed-version state, executing migration jobs. (In the test harness, this is essentially the final step tpccBackgroundStepper.wait [1].) From the node logs, we can see that active cluster version is 1000022.1-48 on n2, n4 and 1000022.1-47 on n1, n3,

for i in `seq 1 4`; do echo "n${i}"; grep "active cluster version setting" logs/$i.unredacted/cockroach.log |tail -1;done
n1
I220908 17:59:30.780410 576281 server/migration.go:149 ⋮ [n1,bump-cluster-version] 1138  active cluster version setting is now ‹1000022.1-47(fence)› (up from ‹1000022.1-46›)
n2
I220908 17:59:30.993236 666404 server/migration.go:149 ⋮ [n2,bump-cluster-version] 755  active cluster version setting is now ‹1000022.1-48› (up from ‹1000022.1-47(fence)›)
n3
I220908 17:59:30.780334 716309 server/migration.go:149 ⋮ [n3,bump-cluster-version] 732  active cluster version setting is now ‹1000022.1-47(fence)› (up from ‹1000022.1-46›)
n4
I220908 17:59:31.189051 430132 server/migration.go:149 ⋮ [n4,bump-cluster-version] 159  active cluster version setting is now ‹1000022.1-48› (up from ‹1000022.1-47(fence)›)

The workload failure induced the test failure by invoking t.Fatal [2] after the monitor detects an error (via WaitE). As every roachtest failure induces collectClusterArtifacts, we attempt to grab the logs from every node. However, as can be seen in the teardown.log, some of the logs could not be transferred successfully. Upon a closer examination, it appears that errors are swallowed inside cluster.Get [3] (l.File is non-nil when invoked from roachtest and one of the lines contains an error message).

teardown: 17:59:35 cluster.go:1118: failed to fetch logs: cluster.Get: get logs failed

Thus, it's technically possible that some of the logs may have been truncated. However, it's highly unlikely that both n1's and n3's cockroach.log got truncated. According to journalctl, both nodes exit with 1, at 17:59:31 and 17:59:32,

Sep 08 17:59:31 teamcity-6383257-1662614354-100-n5cpu16-0003 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE
Sep 08 17:59:32 teamcity-6383257-1662614354-100-n5cpu16-0001 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE

Note that neither process was killed yet there is no trace of any panic in the logs. It appears that both nodes exited with UnspecifiedError. Oddly, the message "Failed running %q\n" [4] is not in any of the logs. These are the last few messages in cockroach.log,

tail -5 logs/1.unredacted/cockroach.log

I220908 17:59:30.787980 49890 upgrade/upgradecluster/cluster.go:118 ⋮ [n1,client=35.196.70.170:33426,user=root,migration-mgr] 1142  executing bump-cluster-version=1000022.1-48 on nodes n{1,2,3,4}
I220908 17:59:30.875387 573820 sql/gcjob/gc_job_utils.go:58 ⋮ [n1,job=794917503914573825] 1143  marked index 3 as GC'd
I220908 17:59:30.881949 573820 sql/gcjob/gc_job_utils.go:289 ⋮ [n1,job=794917503914573825] 1144  updated progress payload: ‹indexes:<index_id:3 status:CLEARED > ranges_unsplit_done:true›
I220908 17:59:30.886290 573820 sql/gcjob/gc_job_utils.go:296 ⋮ [n1,job=794917503914573825] 1145  updated running status: ‹waiting for GC TTL›
I220908 17:59:30.889058 573820 jobs/registry.go:1205 ⋮ [n1] 1146  SCHEMA CHANGE GC job 794917503914573825: stepping through state succeeded with error: <nil>
tail -5 logs/3.unredacted/cockroach.log

I220908 17:59:30.765627 716258 server/migration.go:149 ⋮ [n3,bump-cluster-version] 730  active cluster version setting is now ‹1000022.1-45(fence)› (up from ‹1000022.1-44›)
I220908 17:59:30.770121 716191 server/migration.go:149 ⋮ [n3,bump-cluster-version] 731  active cluster version setting is now ‹1000022.1-46› (up from ‹1000022.1-45(fence)›)
I220908 17:59:30.780334 716309 server/migration.go:149 ⋮ [n3,bump-cluster-version] 732  active cluster version setting is now ‹1000022.1-47(fence)› (up from ‹1000022.1-46›)
I220908 17:59:31.047900 44899 jobs/wait.go:152 ⋮ [n3,intExec=‹set-version›,migration-mgr] 733  waited for 1 [794916516998709249] queued jobs to complete 4m44.045664367s
I220908 17:59:31.049316 44899 upgrade/upgradecluster/cluster.go:118 ⋮ [n3,intExec=‹set-version›,migration-mgr] 734  executing bump-cluster-version=1000022.1-17(fence) on nodes n{1,2,3,4}

[1] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/tests/tpcc.go#L431 [2] https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/tests/mixed_version_jobs.go#L73 [3] https://github.com/cockroachdb/cockroach/blob/master/pkg/roachprod/install/cluster_synced.go#L2007 [4] https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/cli.go#L73

srosenberg commented 2 years ago

Examining both system and application metrics, nothing looks anomalous. All nodes have ample system resources. Below graphs corroborate that both n1 and n3 terminate at 17:59:31 while the other two nodes continue to execute,

tpcc_mixed_workload_fails_network
cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 726cf22b9f06b766d857b4617dec0df18d1e5cd0:

          |   283.0s        0          203.1           95.9     21.0     29.4     37.7     44.0 payment
          |   283.0s        0           19.0            9.6     29.4     39.8     46.1     46.1 stockLevel
          |   284.0s        0           16.0            9.4     65.0     75.5    100.7    100.7 delivery
          |   284.0s        0          182.0           89.4     32.5     46.1     52.4     62.9 newOrder
          |   284.0s        0           11.0            9.6      6.0      6.8      8.4      8.4 orderStatus
          |   284.0s        0          197.0           96.2     19.9     27.3     32.5     35.7 payment
          |   284.0s        0           19.0            9.6     29.4     50.3     56.6     56.6 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   285.0s        0           15.0            9.4     60.8     71.3     75.5     75.5 delivery
          |   285.0s        0          205.8           89.8     32.5     41.9     44.0     48.2 newOrder
          |   285.0s        0           16.0            9.6      6.3     11.5     11.5     11.5 orderStatus
          |   285.0s        0          197.8           96.6     21.0     31.5     46.1     46.1 payment
          |   285.0s        0           16.0            9.7     23.1     35.7     52.4     52.4 stockLevel
          |   286.0s        0           15.0            9.4     67.1     79.7     83.9     83.9 delivery
          |   286.0s        0          183.1           90.1     31.5     39.8     50.3     54.5 newOrder
          |   286.0s        0           15.0            9.6      7.3      8.9     10.5     10.5 orderStatus
          |   286.0s        0          178.1           96.8     21.0     28.3     35.7     56.6 payment
          |   286.0s        0           16.0            9.7     30.4     41.9     46.1     46.1 stockLevel
          |   287.0s        0           16.0            9.4     56.6     62.9     65.0     65.0 delivery
          |   287.0s        0          192.1           90.5     32.5     41.9     48.2     52.4 newOrder
          |   287.0s        0           18.0            9.6      6.8      8.9     10.5     10.5 orderStatus
          |   287.0s        0          189.1           97.2     21.0     28.3     31.5     32.5 payment
          |   287.0s        0           18.0            9.7     25.2     46.1     50.3     50.3 stockLevel
          |   288.0s        0           18.0            9.5     62.9     71.3     75.5     75.5 delivery
          |   288.0s        0          193.0           90.8     33.6     44.0     56.6     62.9 newOrder
          |   288.0s        0           20.0            9.7      6.6      8.9     10.0     10.0 orderStatus
          |   288.0s        0          186.0           97.5     21.0     29.4     32.5     39.8 payment
          |   288.0s        0           23.0            9.8     23.1     39.8     46.1     46.1 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   289.0s        0           16.0            9.5     65.0     83.9    104.9    104.9 delivery
          |   289.0s        0          174.0           91.1     35.7     60.8     83.9     96.5 newOrder
          |   289.0s        0           15.0            9.7      6.6      9.4     10.5     10.5 orderStatus
          |   289.0s        0          174.0           97.7     21.0     41.9     67.1     71.3 payment
          |   289.0s        0           18.0            9.8     23.1     35.7     52.4     52.4 stockLevel
          |   290.0s        0           11.0            9.5     60.8     67.1     71.3     71.3 delivery
          |   290.0s        0          182.9           91.4     33.6     46.1     50.3     71.3 newOrder
          |   290.0s        0           19.0            9.7      6.8      9.4     10.0     10.0 orderStatus
          |   290.0s        0          195.9           98.1     21.0     29.4     32.5     35.7 payment
          |   290.0s        0           17.0            9.8     26.2     37.7     44.0     44.0 stockLevel
        Wraps: (8) secondary error attachment
          | UNCLASSIFIED_PROBLEM: context canceled
          | (1) UNCLASSIFIED_PROBLEM
          | Wraps: (2) Node 5. Command with error:
          |   | ``````
          |   | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          |   | ``````
          | Wraps: (3) context canceled
          | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
        Wraps: (9) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

blathers-crl[bot] commented 2 years ago

cc @cockroachdb/test-eng

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ a0bfa6dafcc206301d3a21887c374db63b377075:

          |    65.0s        0           21.0           18.4     52.4     60.8     71.3     71.3 delivery
          |    65.0s        0          196.1          194.2     26.2     32.5     39.8     46.1 newOrder
          |    65.0s        0           11.0           18.1      7.6      8.9      9.4      9.4 orderStatus
          |    65.0s        0          195.1          187.9     14.7     19.9     23.1     31.5 payment
          |    65.0s        0           20.0           18.3     18.9     27.3     39.8     39.8 stockLevel
          |    66.0s        0           19.0           18.5     54.5     88.1     88.1     88.1 delivery
          |    66.0s        0          192.9          194.2     27.3     39.8     48.2     56.6 newOrder
          |    66.0s        0           15.0           18.0      6.3      7.6      8.4      8.4 orderStatus
          |    66.0s        0          197.9          188.0     16.3     23.1     25.2     32.5 payment
          |    66.0s        0           16.0           18.3     18.9     39.8     48.2     48.2 stockLevel
          |    67.0s        0           16.0           18.4     52.4     58.7     60.8     60.8 delivery
          |    67.0s        0          196.6          194.3     27.3     37.7     46.1     52.4 newOrder
          |    67.0s        0           19.0           18.0      5.8      7.3      7.6      7.6 orderStatus
          |    67.0s        0          184.7          188.0     15.7     23.1     25.2     30.4 payment
          |    67.0s        0           22.0           18.3     15.2     29.4     48.2     48.2 stockLevel
          |    68.0s        0           16.0           18.4     54.5     65.0     65.0     65.0 delivery
          |    68.0s        0          188.5          194.2     25.2     35.7     41.9     44.0 newOrder
          |    68.0s        0           20.0           18.1      5.8      8.9      8.9      8.9 orderStatus
          |    68.0s        0          191.5          188.0     14.7     21.0     27.3     31.5 payment
          |    68.0s        0           15.0           18.3     18.9     27.3     41.9     41.9 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |    69.0s        0           14.0           18.3     50.3     58.7     62.9     62.9 delivery
          |    69.0s        0          213.7          194.5     26.2     37.7     52.4     54.5 newOrder
          |    69.0s        0           19.0           18.1      5.5      7.6      7.9      7.9 orderStatus
          |    69.0s        0          163.7          187.7     15.7     26.2     32.5     35.7 payment
          |    69.0s        0           22.0           18.3     24.1     54.5     58.7     58.7 stockLevel
          |    70.0s        0           17.0           18.3     52.4     65.0     65.0     65.0 delivery
          |    70.0s        0          185.0          194.3     27.3     39.8     41.9     46.1 newOrder
          |    70.0s        0           15.0           18.0      6.0      7.1      7.3      7.3 orderStatus
          |    70.0s        0          164.0          187.3     15.7     24.1     29.4     32.5 payment
          |    70.0s        0           18.0           18.3     16.3     31.5     46.1     46.1 stockLevel
          |    71.0s        0           11.0           18.2     56.6     67.1     67.1     67.1 delivery
          |    71.0s        0          218.3          194.7     29.4     56.6     79.7     79.7 newOrder
          |    71.0s        0           22.0           18.1      6.6     12.1     14.2     14.2 orderStatus
          |    71.0s        0          199.2          187.5     17.8     32.5     46.1     65.0 payment
          |    71.0s        0           13.0           18.3     22.0     41.9     48.2     48.2 stockLevel
          |    72.0s        0            7.0           18.0     56.6     67.1     67.1     67.1 delivery
          |    72.0s        0          111.0          193.5     28.3     37.7     41.9     44.0 newOrder
          |    72.0s        0            7.0           17.9      6.6      7.3      7.3      7.3 orderStatus
          |    72.0s        0           93.0          186.2     17.8     23.1     28.3     32.5 payment
          |    72.0s        0            6.0           18.1     19.9     52.4     52.4     52.4 stockLevel
        Wraps: (8) COMMAND_PROBLEM
        Wraps: (9) Node 5. Command with error:
          | ``````
          | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          | ``````
        Wraps: (10) exit status 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

    versionupgrade.go:530,versionupgrade.go:197,tpcc.go:432,test_runner.go:928: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot blocks-22.2.0-beta.2 branch-release-22.2 release-blocker] - #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 84384b50c023dd4c05fff76af85a6975f5d2b0ab:

          |   252.0s        0          162.0           79.1     25.2     35.7     46.1     54.5 newOrder
          |   252.0s        0           19.0            8.8      7.3      8.4      8.9      8.9 orderStatus
          |   252.0s        0          157.0           86.1     13.6     26.2     30.4     32.5 payment
          |   252.0s        0           13.0            8.8     16.3     25.2     31.5     31.5 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   253.0s        0           10.0            8.5     54.5     83.9     83.9     83.9 delivery
          |   253.0s        0          160.1           79.4     23.1     28.3     35.7     41.9 newOrder
          |   253.0s        0           23.0            8.8      5.8      8.9     15.7     15.7 orderStatus
          |   253.0s        0          143.1           86.3     13.1     24.1     37.7     39.8 payment
          |   253.0s        0           12.0            8.8     15.7     31.5     31.5     31.5 stockLevel
          |   254.0s        0           15.0            8.5     54.5     67.1     75.5     75.5 delivery
          |   254.0s        0          139.0           79.6     25.2     35.7     46.1     75.5 newOrder
          |   254.0s        0           10.0            8.9      6.8      8.9      8.9      8.9 orderStatus
          |   254.0s        0          173.0           86.6     13.6     22.0     31.5     48.2 payment
          |   254.0s        0           15.0            8.8     22.0     31.5     50.3     50.3 stockLevel
          |   255.0s        0            7.0            8.5     54.5     56.6     56.6     56.6 delivery
          |   255.0s        0          156.0           79.9     25.2     33.6     39.8     50.3 newOrder
          |   255.0s        0           14.0            8.9      6.3      8.4     10.5     10.5 orderStatus
          |   255.0s        0          181.0           87.0     13.6     25.2     28.3     33.6 payment
          |   255.0s        0           24.0            8.9     13.1     28.3     31.5     31.5 stockLevel
          |   256.0s        0           10.0            8.5     50.3    113.2    113.2    113.2 delivery
          |   256.0s        0          140.0           80.2     25.2     44.0     48.2     54.5 newOrder
          |   256.0s        0           12.0            8.9      6.0      7.9      8.9      8.9 orderStatus
          |   256.0s        0          191.9           87.4     14.2     35.7     46.1     48.2 payment
          |   256.0s        0           13.0            8.9     17.8     27.3     27.3     27.3 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |   257.0s        0           12.0            8.5     56.6     71.3     75.5     75.5 delivery
          |   257.0s        0          184.1           80.6     26.2     32.5     39.8     41.9 newOrder
          |   257.0s        0            9.0            8.9      6.8      8.9      8.9      8.9 orderStatus
          |   257.0s        0          152.1           87.7     14.7     22.0     29.4     37.7 payment
          |   257.0s        0           13.0            8.9     19.9     27.3     39.8     39.8 stockLevel
          |   258.0s        0           24.0            8.6     56.6     71.3     83.9     83.9 delivery
          |   258.0s        0          175.8           81.0     25.2     37.7     46.1     62.9 newOrder
          |   258.0s        0           19.0            8.9      7.9     11.0     11.0     11.0 orderStatus
          |   258.0s        0          165.8           88.0     14.2     23.1     39.8     41.9 payment
          |   258.0s        0           15.0            8.9     18.9     24.1     41.9     41.9 stockLevel
          |   259.0s        0           12.0            8.6     54.5     62.9     79.7     79.7 delivery
          |   259.0s        0          137.0           81.2     25.2     33.6     41.9     46.1 newOrder
          |   259.0s        0           17.0            9.0      7.9     10.5     10.5     10.5 orderStatus
          |   259.0s        0          156.0           88.2     13.6     19.9     24.1     37.7 payment
          |   259.0s        0           17.0            8.9     18.9     28.3     39.8     39.8 stockLevel
        Wraps: (8) COMMAND_PROBLEM
        Wraps: (9) Node 5. Command with error:
          | ``````
          | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4}
          | ``````
        Wraps: (10) exit status 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

    versionupgrade.go:530,versionupgrade.go:197,tpcc.go:432,test_runner.go:928: pq: query execution canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot branch-release-22.2] - #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

srosenberg commented 2 years ago

Latest failure has the same failure mode,

Oct 03 15:59:13 teamcity-6749797-1664774404-105-n5cpu16-0003 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE

Ongoing internal investigation: https://cockroachlabs.slack.com/archives/C01CDD4HRC5/p1664819770906019?thread_ts=1664295784.890119&cid=C01CDD4HRC5

cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ e06d2286b011096526eda7f2d7f7bb7acea0ae84:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/n5cpu16/run_1
(versionupgrade.go:533).setClusterSettingVersionStep: pq: rpc error: code = Unavailable desc = error reading from server: read tcp 10.142.1.113:47054->10.142.1.79:26257: read: connection reset by peer
(monitor.go:127).Wait: monitor failure: monitor task failed: output in run_144818.024420287_n5_cockroach_workload_run_tpcc: ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=0 --pprofport=33333  {pgurl:1-4} returned: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot branch-release-22.2] - #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

srosenberg commented 2 years ago

Yet another example of a node doing exit 1 without any stack trace.

In test.log,

14:48:18 tpcc.go:254: test worker status: running tpcc worker=0 warehouses=909 ramp=5m0s duration=2h0m0s on {pgurl:1-4} (<1m0s)

In journalctl.txt,

2.journalctl.txt:Oct 08 14:52:40 teamcity-6837129-1665206365-100-n5cpu16-0002 systemd[1]: cockroach.service: Main process exited, code=exited, status=1/FAILURE

In cockroach-pebble, the last upgraded format version is 008,

I221008 14:48:10.546483 46770 3@pebble/event.go:645 ⋮ [n2,pebble,s2] 5555  upgraded to format version: ‹008›
cockroach-teamcity commented 2 years ago

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 7be0b20edbc336200c1510a9c6f1d76ae2f92c3a:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/n5cpu16/run_1
(monitor.go:127).Wait: monitor failure: monitor task failed: output in run_142544.008795915_n1_v2216cockroach_workload_fixtures_import_bank: v22.1.6/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank returned: SSH_PROBLEM: exit status 255
(test_runner.go:1062).teardownTest: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #89755 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot T-testeng branch-release-22.2.0 release-blocker] - #88668 roachtest: tpcc/mixed-headroom/n5cpu16 failed [C-test-failure O-roachtest O-robot branch-release-22.2] - #74892 roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-21.2]

This test on roachdash | Improve this report!

srosenberg commented 2 years ago

Last failure is an entirely different failure mode. The bank import step appears to run for hours until it's killed due to test time out.

The preceding step to import tpcc completes @14:25,

I221015 14:25:41.948258 1 ccl/workloadccl/fixture.go:326  [-] 11  imported 62 GiB bytes in 9 tables (took 4m59.344260633s, 212.78 MiB/s)

The bank import starts immediately after,

run_142544.008795915_n1_v2216cockroach_workload_fixtures_import_bank: 14:25:44 cluster.go:291: > v22.1.6/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank
I221015 14:25:44.936592 1 ccl/workloadccl/fixture.go:318  [-] 1  starting import of 1 tables

All nodes appear to be live for the remaining ~10 hours,

cpu_ram network
srosenberg commented 2 years ago

@stevendanna Would you mind taking a look at the logs to see what could possible have caused the import to run for ~10 hours. The last warning message concerning the import is @15:35,

logs/3.unredacted/cockroach.log:W221015 15:35:13.557144 86844 kv/bulk/sst_batcher.go:469 ⋮ [n3,f‹d1df2c12›,job=805350752177750017] 25254  ‹bank rows› failed to scatter : existing range size 10496962 exceeds specified limit 4194304

On n2 we see these warnings every minute, starting @15:08, ~6 minutes after the split is initiated,

logs/2.unredacted/cockroach.log:I221015 15:02:47.223326 373620 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n2,s2,r6742/1:‹/Table/181/1/{284963…-325520…}›] 18370  initiating a split of this range at key ‹/Table/181/1/28503022› [r6746] (‹manual›)‹›
logs/2.unredacted/cockroach.log:I221015 15:02:47.346852 373677 kv/kvserver/pkg/kv/kvserver/replica_command.go:2260 ⋮ [n2,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 18375  change replicas (add [(n4,s4):4LEARNER] remove []): existing descriptor r6746:‹/Table/181/1/{28503022-32552000}› [(n2,s2):1, (n1,s1):2, (n3,s3):3, next=4, gen=3665, sticky=1665846767.222826180,0]
logs/2.unredacted/cockroach.log:W221015 15:08:49.363284 680702 kv/kvserver/pkg/kv/kvserver/merge_queue.go:411 ⋮ [n2,merge,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 19925  ‹kv/kvserver/pkg/kv/kvserver/replica_command.go›:810: merge failed: fetching current range descriptor value: context deadline exceeded
logs/2.unredacted/cockroach.log:W221015 15:09:49.364832 741327 kv/kvclient/kvcoord/dist_sender.go:1602 ⋮ [n2,merge,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 20262  slow range RPC: have been waiting 60.00s (1 attempts) for RPC Get [‹/Local/Range/Table/181/1/28503022/RangeDescriptor›,‹/Min›), [txn: c5a14092], [can-forward-ts] to r6746:‹/Table/181/1/{28503022-32552000}› [(n2,s2):1, (n1,s1):2, (n3,s3):3, next=4, gen=3665, sticky=1665846767.222826180,0]; resp: ‹(err: context deadline exceeded: "merge" meta={id=c5a14092 key=/Local/Range/Table/181/1/28503022/RangeDescriptor pri=0.00562966 epo=0 ts=1665846529.364048180,0 min=1665846529.364048180,0 seq=0} lock=true stat=PENDING rts=1665846529.364048180,0 wto=false gul=1665846529.864048180,0)›

and persisting until the time out @00:19:50,

W221016 00:19:50.341423 741327 kv/kvserver/pkg/kv/kvserver/merge_queue.go:411 ⋮ [n2,merge,s2,r6746/1:‹/Table/181/1/{285030…-325520…}›] 28905  ‹kv/kvserver/pkg/kv/kvserver/replica_command.go›:810: merge failed: fetching current range descriptor value: context deadline exceeded
jbowens commented 2 years ago

Was the panic provided in cockroachdb/pebble#2019 not making it into the logs? This appears to be the source of failures going back to Aug 29. Do we know why the panic didn't make it into the logs?

renatolabs commented 2 years ago

Was the panic provided in cockroachdb/pebble#2019 not making it into the logs? This appears to be the source of failures going back to Aug 29. Do we know why the panic didn't make it into the logs?

Indeed, it never made it to the logs, which is what made debugging this test failure difficult. We are looking into why the crash was never logged (internal discussion).

renatolabs commented 2 years ago

This (exit 1) should be fixed by #90406. Closing so that we get a new issue for any future failures.