cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.04k stars 3.79k forks source link

roachtest: backup-restore/small-ranges failed #115852

Closed cockroach-teamcity closed 10 months ago

cockroach-teamcity commented 10 months ago

roachtest.backup-restore/small-ranges failed with artifacts on master @ 37ad01a3972cb4d34bfc6dfb4b9cfcac360b15dd:

(monitor.go:153).Wait: monitor failure: backup 2_round-trip-test-backup_database-tpcc: check_files failed: pq: query execution canceled due to statement timeout
test artifacts and logs in: /artifacts/backup-restore/small-ranges/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=true , ROACHTEST_metamorphicBuild=false , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7) See: [Grafana](https://go.crdb.dev/roachtest-grafana/teamcity-13038112/backup-restore-small-ranges/1702017662329/1702019308209)

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-34252

msbutler commented 10 months ago

For some reason, we hit a statement timeout while running SHOW BACKUP. The test has only been running for 20 minutes: We fail one minute into running show backup in this function: https://github.com/cockroachdb/cockroach/blob/bd55d2232fd8785aaa2e116be9779fc03ddc7f10/pkg/cmd/roachtest/tests/mixed_version_backup.go#L2227

07:04:56 mixed_version_backup.go:1757: computed contents for 9 tables as part of 2_round-trip-test-backup_database-tpcc
07:04:56 backup_restore_roundtrip.go:183: verifying backup 2
07:05:56 mixed_version_backup.go:1337: context is canceled, finishing
07:05:56 test_impl.go:414: test failure #1: full stack retained in failure_1.log: (monitor.go:153).Wait: monitor failure: backup 2_round-trip-test-backup_database-tpcc: check_files failed: pq: query execution canceled due to statement timeout
07:05:56 test_runner.go:1130: test completed with failure(s)
07:05:56 test_runner.go:1164: skipping post test assertions as test failed

It's also worth noting, that this line 07:05:56 mixed_version_backup.go:1337: context is canceled, finishing, occurs because the stopWorkload gets called on defer: https://github.com/cockroachdb/cockroach/blob/bd55d2232fd8785aaa2e116be9779fc03ddc7f10/pkg/cmd/roachtest/tests/backup_restore_roundtrip.go#L151

All this is to say, the whole roachtest shutdown due to a strange statement timeout

msbutler commented 10 months ago

removing the release blocker, as this failure is unrelated to backup-restore

msbutler commented 10 months ago

ohhh, the fingerprint cmds are failing because the new descriptor validation check adds a statement timeout https://github.com/cockroachdb/cockroach/blob/831e830ea02c6036a64ab5bd758b8fe46282043a/pkg/cmd/roachtest/roachtestutil/validation_check.go#L73

cockroach-teamcity commented 10 months ago

roachtest.backup-restore/small-ranges failed with artifacts on master @ e747f6e6857a19d6048cb184b0f55c52cb8a6390:

(test_runner.go:1134).runTest: test timed out (4h0m0s)
test artifacts and logs in: /artifacts/backup-restore/small-ranges/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!