Closed aliher1911 closed 10 months ago
cc @cockroachdb/disaster-recovery
There's an issue with naive fix to align limits as far as I understand as backup processors are per node and request limits are per store. Without the fix, you can bump limit to above core count and still be able to run many requests that would be handled by different stores. At the same time this is probably not good as we may be CPU limited and not store limited.
This changes won't be visible with default setting of 3 on most configs, but if clients changed requests limit to higher number it will change the behaviour for them.
Backup processor determines its worker count based on kv setting
kv.bulk_io_write.concurrent_export_requests
and available memory where kv setting determines max value to avoid issuing requests that would be throttled by server:https://github.com/cockroachdb/cockroach/blob/393a3b1c186134be773b27ef2480e0accb7de52d/pkg/ccl/backupccl/backup_processor.go#L660-L682
At the same time kv uses
kv.bulk_io_write.concurrent_export_requests
differently and caps it by number of available cpu cores that could be used for export:https://github.com/cockroachdb/cockroach/blob/ab00ae96196b3b2ca8b77d1ec3bce35050316f1d/pkg/kv/kvserver/store.go#L1554-L1564
This is not a problem with servers with high number of cores where default limit is set below core count, but could become an issue on servers with low core count. In that case, changing the setting will not increase kv limit but will increase number of inflight export requests which would wait on limiter and potentially time out.
Jira issue: CRDB-31887