Make Restores more performant/resilient with very large operations across ISPs

In attempting to restore an 8TB database from AWS Virginia to Azure Iowa, we encountered repeated "TLS: Bad MAC record" errors that broke the restore. We suspect glitchy intermediate hardware and possibly a bug in the golang crypo library, but were unable to figure this out.

In attempting to restore an 8TB database from AWS Virginia to GCP South Carolina, we encountered repeated exhausted retries: importing 14457 ranges: inbox communication error: grpc: context cancelled messages. These paused the restore job, instead of cancelling outright, but it's still weird. This error isn't deterministic - resuming the job, repeatedly, enabled more progress.

The fixture in question is the 8TB TPCE workload fixture.

The common thread could be AWS, but we don't seem to have issues like these on backups. It seems likely there's room to make Restore more resilient to network instability.

Jira issue: CRDB-24900

Epic CRDB-20915

cockroachdb / cockroach

Make Restores more performant/resilient with very large operations across ISPs #97818