cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

backup: use worker count setting consistent with export request limit #111406

Closed aliher1911 closed 10 months ago

aliher1911 commented 1 year ago

Backup processor determines its worker count based on kv setting kv.bulk_io_write.concurrent_export_requests and available memory where kv setting determines max value to avoid issuing requests that would be throttled by server:

https://github.com/cockroachdb/cockroach/blob/393a3b1c186134be773b27ef2480e0accb7de52d/pkg/ccl/backupccl/backup_processor.go#L660-L682

At the same time kv uses kv.bulk_io_write.concurrent_export_requests differently and caps it by number of available cpu cores that could be used for export:

https://github.com/cockroachdb/cockroach/blob/ab00ae96196b3b2ca8b77d1ec3bce35050316f1d/pkg/kv/kvserver/store.go#L1554-L1564

This is not a problem with servers with high number of cores where default limit is set below core count, but could become an issue on servers with low core count. In that case, changing the setting will not increase kv limit but will increase number of inflight export requests which would wait on limiter and potentially time out.

Jira issue: CRDB-31887

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/disaster-recovery

aliher1911 commented 1 year ago

There's an issue with naive fix to align limits as far as I understand as backup processors are per node and request limits are per store. Without the fix, you can bump limit to above core count and still be able to run many requests that would be handled by different stores. At the same time this is probably not good as we may be CPU limited and not store limited.

This changes won't be visible with default setting of 3 on most configs, but if clients changed requests limit to higher number it will change the behaviour for them.

dt commented 10 months ago

https://github.com/cockroachdb/cockroach/pull/115158