Closed vanyasvl closed 2 years ago
upload_concurrency
/ download concurrency
define how much parallel download / upload go-routines will start independent of remote storage type. on 1.3.0 it mean how much parallel data parts will upload
concurrency
in s3
section mean how much concurrent upload stream will run during multipart upload in each upload go-routine
high S3_CONCURRENCY
+ high S3_PART_SIZE
will allocate high memory for buffers inside AWS golang SDK
If I set upload_concurrency to 10 and s3 concurrency to 1, how many uploads will be in parallel, 1 or 10? 10 parallel uploads, and each upload will restrict on S3 as only one multipart upload stream
moreover, I recommends to use compression_type: tar
to avoid allocate high CPU usage
Thanks. So what is the correct way to increase upload speed, and not to use too much memory? In 1.3.0. with upload_concurrency 10 and s3 concurrency 1 it's even not enough 64Gb ram on server. With upload_concurrency 5 and s3 concurrency 1 clickhouse-backup utilize about 40Gb ram. We use part_size 1gb and max_file_size 1gb too. So may be better to decrease part_size to 100m and set s3 concurrency to 10?
did you define s3->part_size? or general -> max_file_size?
could you run
clickhouse-backup print-config
and share your current config without sensitive credentials?
Yes, I define part_size, because our storage (Swift with s3 middleware) doesn't support more than 1000 parts per object
general:
remote_storage: s3
max_file_size: 1073741824
disable_progress_bar: false
backups_to_keep_local: 1
backups_to_keep_remote: 30
log_level: debug
allow_empty_backups: false
download_concurrency: 10
upload_concurrency: 5
restore_schema_on_cluster: ""
upload_by_part: true
download_by_part: true
clickhouse:
username: default
password: ""
host: 127.0.0.1
port: 9000
disk_mapping: {}
skip_tables:
- system.*
- default.*
- INFORMATION_SCHEMA.*
- information_schema.*
timeout: 5m
freeze_by_part: false
secure: false
skip_verify: false
sync_replicated_tables: false
log_sql_queries: false
config_dir: /etc/clickhouse-server/
restart_command: systemctl restart clickhouse-server
ignore_not_exists_error_during_freeze: true
debug: false
s3:
....
region: us-east-1
acl: private
assume_role_arn: ""
force_path_style: true
path: ""
disable_ssl: false
compression_level: 1
compression_format: tar
sse: ""
disable_cert_verification: false
storage_class: STANDARD
concurrency: 1
part_size: 1073741824
debug: false
gcs:
credentials_file: ""
credentials_json: ""
bucket: ""
path: ""
compression_level: 1
compression_format: tar
debug: false
endpoint: ""
cos:
url: ""
timeout: 2m
secret_id: ""
secret_key: ""
path: ""
compression_format: tar
compression_level: 1
debug: false
api:
listen: 0.0.0.0:7171
enable_metrics: true
enable_pprof: false
username: ""
password: ""
secure: false
certificate_file: ""
private_key_file: ""
create_integration_tables: false
allow_parallel: false
ftp:
address: ""
timeout: 2m
username: ""
password: ""
tls: false
path: ""
compression_format: tar
compression_level: 1
concurrency: 28
debug: false
sftp:
address: ""
port: 22
username: ""
password: ""
key: ""
path: ""
compression_format: tar
compression_level: 1
concurrency: 1
debug: false
azblob:
endpoint_suffix: core.windows.net
account_name: ""
account_key: ""
sas: ""
use_managed_identity: false
container: ""
path: ""
compression_level: 1
compression_format: tar
sse_key: ""
buffer_size: 0
buffer_count: 3
Try to decrease s3->part_size to 52428800 instead of 1Gb?
Thanks. Decreasing part_size decrease memory consumption. Will try 100mb par_size with concurrency
upload_concurrency
/download concurrency
define how much parallel download / upload go-routines will start independent of remote storage type. on 1.3.0 it mean how much parallel data parts will upload
concurrency
ins3
section mean how much concurrent upload stream will run during multipart upload in each upload go-routinehigh
S3_CONCURRENCY
+ highS3_PART_SIZE
will allocate high memory for buffers inside AWS golang SDK
Could you add it to the documentation?
Could you add it to the documentation?
You are welcome to make pull request
Hello. Clickhouse-backup has options download_concurrency/upload_concurrency in the general section and
concurrency
in the s3 section. What the difference? Should thy match? If I set upload_concurrency to 10 and s3 concurrency to 1, how many uploads will be in parallel, 1 or 10?Thanks