cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.95k stars 3.79k forks source link

roachtest: restore/tpce/8TB/aws/nodes=10/cpus=8 failed #125399

Closed cockroach-teamcity closed 1 month ago

cockroach-teamcity commented 3 months ago

roachtest.restore/tpce/8TB/aws/nodes=10/cpus=8 failed with artifacts on release-23.2 @ f88e669339210a94016ce6555168dfffa5df8159:

(monitor.go:153).Wait: monitor failure: pq: pausing due to error; use RESUME JOB to try to proceed once the issue is resolved, or CANCEL JOB to rollback: RequestError: send request failed
caused by: Get "https://cockroach-fixtures-us-east-2.s3.us-east-2.amazonaws.com/backups/tpc-e/customers%3D500000/v22.2.1/inc-count%3D48/incrementals/2023/01/05-132812.76/20230105/170000.00/data/828596601353830401.sst": local error: tls: bad record MAC
test artifacts and logs in: /artifacts/restore/tpce/8TB/aws/nodes=10/cpus=8/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-39434

msbutler commented 3 months ago

while i'm not sure why we hit this exact networking error, the fact that this job failed on this error is not fully unexpected. this patch should make 24.1 less flaky to these shenanigans https://github.com/cockroachdb/cockroach/pull/116957

cockroach-teamcity commented 2 months ago

roachtest.restore/tpce/8TB/aws/nodes=10/cpus=8 failed with artifacts on release-23.2 @ 7ca8340d8b1316144a9f0f53e736f168d99a0bab:

(monitor.go:153).Wait: monitor failure: read tcp 172.17.0.3:34146 -> 3.23.102.210:26257: read: connection timed out
test artifacts and logs in: /artifacts/restore/tpce/8TB/aws/nodes=10/cpus=8/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

This test on roachdash | Improve this report!

stevendanna commented 1 month ago

We weren't able to reproduce the original flake in this ticket and the follow up flake has no logs unfotunately.