Open msbutler opened 2 years ago
cc @cockroachdb/bulk-io
(Out of scope but I’d love to see a Docs page for every setting -- why you would use it, how it interacts with other settings, risks & trade-offs. cc @kathancox)
fwiw, I just ran our restore tpccInc roachtest on 23.1. i.e.:
"RESTORE DATABASE tpcc FROM '/2022/09/07-000000.00' IN 'gs://cockroach-fixtures/tpcc-incrementals-22.2?AUTH=implicit' AS OF SYSTEM TIME '2022-09-07 12:15:00' WITH detached"
On a cluster with the following topology:
roachprod create $CLUSTER -n 4 --gce-machine-type="n1-standard-8" --gce-pd-volume-size=1000 --local-ssd=false
and increasing kv.bulk_io_write.concurrent_addsstable_requests
and kv.bulk_io_write.restore_node_concurrency
from 1 to 5 had no measurable effect on throughput.
Bulk jobs interact with many tunable cluster settings. Some of these have public and/or internal advice to tune them. This documentation may be outdated and should be audited and updated. Further, some cluster settings may need to be set to private or removed all together. Below is an attempt to list all tunable cluster settings the DR team should consider auditing (at least for 22.2):
Notes from Matt:
Cluster settings abound!
I think bool settings are a good place to start. Check these searches:
Rough criteria are:
We do want to keep settings around for new functionality that needs maturing (i.e. feature flags), or known cases of a client needing to do things differently than default.
Let’s call out / debate individual settings as comments.
Jira issue: CRDB-19292