cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

roachtest: cdc/sink-chaos failed #96419

Closed cockroach-teamcity closed 1 year ago

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 22244a780dcfaca48162dde8e0f90b5ba9b6bb9c:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1937).Run: output in run_101232.441206973_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_101233.189661854_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-24114

Epic CRDB-11732

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 5fbcd8a8deac0205c7df38e340c1eb9692854383:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1937).Run: output in run_102050.580219051_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_102051.370128293_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 8e24570fa366ed038c6ae65f50db5d8e22826db0:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1937).Run: output in run_101856.333108523_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_101857.122870005_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ eb158026c50d8fa856e42f928d844831ea9e6b28:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1937).Run: output in run_102342.926823441_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_102343.724324591_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ e51ffa013c81212870891001f0328912550fa75d:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1937).Run: output in run_103131.063502119_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: context canceled
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 2a7edbeb0737b1309064c25c641a309c2980d9ba:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1937).Run: output in run_100941.831818606_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: context canceled
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 31365e21dc606cdc1e4302c86192ffc5a6cf1255:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1937).Run: output in run_101924.591387929_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: context canceled
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 7e2df35a2f6bf7a859bb0539c8ca43c4e72ed260:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1940).Run: output in run_103323.114592951_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: context canceled
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ c95bef097bd4c213c6b5c0c125a9a846c4479d73:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1940).Run: output in run_103906.927230883_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_103907.684738529_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 3d054f37c7c87f53cb56fac4e5500f0d1130d09a:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1940).Run: output in run_102531.296808624_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_102532.100190027_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ e9c96e7179e19aae2f8d386f67eb950db8c3354b:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1940).Run: output in run_103203.909948525_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_103204.640858670_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

miretskiy commented 1 year ago

@samiskin any updates on this issue?

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 286b3e235171a39b8f9910555affcc7ce310741a:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1956).Run: output in run_102934.007520384_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_102934.755935866_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

jayshrivastava commented 1 year ago

Latest 3 failures show a problem while running TPCC

  |   | I230222 10:40:35.105029 1786 workload/pgx_helpers.go:79  [T1] 4  pgx logger [error]: Query logParams=map[args:[25 1 2113] err:ERROR: rpc error: code = Unavailable desc = error reading from server: read tcp 10.142.1.2:59786->10.142.1.4:26257: use of closed network connection (SQLSTATE XXUUU) pid:2383385 sql:
  |   | I230222 10:40:35.105029 1786 workload/pgx_helpers.go:79  [T1] 4 +       SELECT sum(ol_amount) FROM order_line
  |   | I230222 10:40:35.105029 1786 workload/pgx_helpers.go:79  [T1] 4 +       WHERE ol_w_id = $1 AND ol_d_id = $2 AND ol_o_id = $3]
  |   | Error: error in delivery: ERROR: rpc error: code = Unavailable desc = error reading from server: read tcp 10.142.1.2:59786->10.142.1.4:26257: use of closed network connection (SQLSTATE XXUUU)

This is from failure_1.log

miretskiy commented 1 year ago

Perhaps the node crashed? It started happening ~3 weeks ago, and keeps happening consistently. I don't think it's a one off issue; and we have this as a release blocker.

jayshrivastava commented 1 year ago

Finally found it. Node 3 panicked: https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_RoachtestNightlyGceBazel/8785686:id/cdc/sink-chaos/run_1/artifacts.zip!/logs/3.unredacted/cockroach-stderr.log

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x3b6adc9]

goroutine 242783 [running]:
panic({0x5002fc0, 0x9ce4030})
    GOROOT/src/runtime/panic.go:987 +0x3ba fp=0xc00d13be20 sp=0xc00d13bd60 pc=0x49dd5a
runtime.panicmem(...)
    GOROOT/src/runtime/panic.go:260
runtime.sigpanic()
    GOROOT/src/runtime/signal_unix.go:835 +0x2f6 fp=0xc00d13be70 sp=0xc00d13be20 pc=0x4b4c16
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc009b62de0, 0x1)
    github.com/Shopify/sarama/external/com_github_shopify_sarama/async_producer.go:620 +0x1a9 fp=0xc00d13bed0 sp=0xc00d13be70 pc=0x3b6adc9
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc009b62de0)
    github.com/Shopify/sarama/external/com_github_shopify_sarama/async_producer.go:564 +0x537 fp=0xc00d13bf90 sp=0xc00d13bed0 pc=0x3b6a937
github.com/Shopify/sarama.(*partitionProducer).dispatch-fm()
    <autogenerated>:1 +0x26 fp=0xc00d13bfa8 sp=0xc00d13bf90 pc=0x3bbca26
github.com/Shopify/sarama.withRecover(0x0?)
    github.com/Shopify/sarama/external/com_github_shopify_sarama/utils.go:43 +0x3e fp=0xc00d13bfc8 sp=0xc00d13bfa8 pc=0x3bb6f9e
github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer.func1()
    github.com/Shopify/sarama/external/com_github_shopify_sarama/async_producer.go:515 +0x26 fp=0xc00d13bfe0 sp=0xc00d13bfc8 pc=0x3b6a346
runtime.goexit()
    GOROOT/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00d13bfe8 sp=0xc00d13bfe0 pc=0x4d2a41
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
    github.com/Shopify/sarama/external/com_github_shopify_sarama/async_producer.go:515 +0x1ea
cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ e028ce5b14505dfd17ef8b13001c0ab8ac811e3c:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1956).Run: output in run_101206.687098033_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_101207.492156179_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 0d3393b0623a5c258b25725f64f3689e2f54667b:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1956).Run: output in run_100636.525023948_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_100637.266464476_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 39c06b5a438c01c93ffbfeeefe702d3f9b620eaf:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1956).Run: output in run_100937.610495214_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_100938.380837809_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 13c58f621519794e775b7cfc4d8b557bc99eeca0:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(monitor.go:127).Wait: monitor failure: monitor command failure: unexpected node event: 3: dead (exit status 134)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ b0e5507f74c07e13cfda8cda8b9079b457a9f37d:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1956).Run: output in run_101305.020474857_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_101305.764062036_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 year ago

roachtest.cdc/sink-chaos failed with artifacts on master @ 21786aa112e6b822858f281c1cc59608987c5c0a:

test artifacts and logs in: /artifacts/cdc/sink-chaos/run_1
(cluster.go:1956).Run: output in run_101708.818500757_n4_workload-run-tpcc-wa: ./workload run tpcc --warehouses=100 --duration=30m  {pgurl:1-3}  returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_101709.557595290_n4_workload-run-tpcc-wa.log: exit status 1
(cdc.go:283).Close: error shutting down prometheus/grafana: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!