We saw backup failures on drt-chaos - All backup jobs are failing with error message shown below - ExportRequest for span %d timed out after 5m
failed to run backup: running distributed backup to export 13544 ranges: export request timeout: operation "ExportRequest for span /Table/109/182/16{59/"|\x9a\xe7`\vvGr\xbd/\xb8\rz\x16K\xd3"-61/"\t<\x8d\xc93\x94J\x00\x93\xb9\xac\x83\x11\t_3"}" timed out after 5m0.001s (given timeout 5m0s): aborted in DistSender: result is ambiguous: context deadline exceeded
These are hourly backup jobs.
Also, oldest protected timestamp pretty high around 1d14h
CPU was around 60-80%, with tpcc and kv workload running on the cluster.
On the hardware dashboard I'm seeing 95%+ CPU utilization, and .98s/1s elastic token exhaustion on the overload dashboard; this looks like AC performing as expected.
We saw backup failures on
drt-chaos
- All backup jobs are failing with error message shown below - ExportRequest for span %d timed out after 5m failed to run backup: running distributed backup to export 13544 ranges: export request timeout: operation "ExportRequest for span /Table/109/182/16{59/"|\x9a\xe7`\vvGr\xbd/\xb8\rz\x16K\xd3"-61/"\t<\x8d\xc93\x94J\x00\x93\xb9\xac\x83\x11\t_3"}" timed out after 5m0.001s (given timeout 5m0s): aborted in DistSender: result is ambiguous: context deadline exceeded These are hourly backup jobs. Also, oldest protected timestamp pretty high around 1d14hCPU was around 60-80%, with tpcc and kv workload running on the cluster.
More details in thread
Jira issue: CRDB-42428