Closed fzyzcjy closed 2 years ago
Could you try 1.4.0 and share LOG_LEVEL=debug clickhouse-backup download 2022-05-16T19-00-08
results?
Sure. I will do that and report it.
2022/05/17 09:13:58.048872 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/keycloak_log/default_202109_7_7_1014.tar.gz
2022/05/17 09:13:58.083370 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/keycloak_log/default_202109_7_7_1014.tar.gz not equal with gzip
2022/05/17 09:13:59.106082 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/keycloak_log/default_202109_7_7_1014.tar.gz
2022/05/17 09:13:59.106156 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/keycloak_log/default_202202_17_17_61.tar.gz
2022/05/17 09:13:59.135966 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/keycloak_log/default_202202_17_17_61.tar.gz not equal with gzip
2022/05/17 09:13:59.144870 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/keycloak_log/default_202202_17_17_61.tar.gz
2022/05/17 09:14:02.417926 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:02.417990 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:02.417996 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:02.418002 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:02.418012 error can't acquire semaphore during downloadTableData: context canceled
about 900000 lines of this ===> "error can't acquire semaphore during downloadTableData: context canceled"
...
2022/05/17 09:14:07.734238 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:07.734243 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:07.734247 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:07.734252 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:07.734259 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:07.734263 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:14:07.734316 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/request_info/default_202106_2_2_1.tar.gz
2022/05/17 09:14:07.764562 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/request_info/default_202106_2_2_1.tar.gz not equal with gzip
2022/05/17 09:14:09.481551 error can't acquire semaphore during Download: context canceled backup=2022-05-16T19-00-08 operation=download
2022/05/17 09:14:09.481617 debug start downloadTableData yplusplus_prod_in_main.request_info with concurrency=2 len(table.Files[...])=56
2022/05/17 09:14:09.481649 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202108_21_21_1.tar.gz
2022/05/17 09:14:09.482097 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202105_11_11_5906.tar.gz
2022/05/17 09:14:09.519474 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202108_21_21_1.tar.gz not equal with gzip
...
2022/05/17 09:14:23.690612 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202204_62265_267158_23809.tar.gz
2022/05/17 09:14:23.780185 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202204_62265_267158_23809.tar.gz not equal with gzip
2022/05/17 09:14:29.943696 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202112_32_32_35752.tar.gz
2022/05/17 09:14:29.943773 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202205_455081_472207_7107.tar.gz
2022/05/17 09:14:29.998111 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202205_455081_472207_7107.tar.gz not equal with gzip
2022/05/17 09:14:32.323227 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_op_main/keycloak_log/default_202111_12_12_7041.tar.gz
2022/05/17 09:14:32.323276 debug start operation=downloadDiffParts table=yplusplus_prod_op_main.keycloak_log
2022/05/17 09:14:32.323358 info done diff_parts=0 duration=0s operation=downloadDiffParts
2022/05/17 09:14:32.323379 info done backup=2022-05-16T19-00-08 duration=38.429s operation=download_data size=23.23MiB table=yplusplus_prod_op_main.keycloak_log
2022/05/17 09:14:32.798363 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202205_455081_472207_7107.tar.gz
2022/05/17 09:14:32.798436 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202103_8_8_19779.tar.gz
2022/05/17 09:14:32.853390 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202103_8_8_19779.tar.gz not equal with gzip
2022/05/17 09:15:05.018054 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202103_8_8_19779.tar.gz
2022/05/17 09:15:05.018118 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202107_17_17_1.tar.gz
...
2022/05/17 09:21:32.807784 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202203_42_42_8242.tar.gz not equal with gzip
2022/05/17 09:21:52.956816 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202109_22_22_47339.tar.gz
2022/05/17 09:21:52.956906 debug start download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202202_38_38_10076.tar.gz
2022/05/17 09:21:53.018275 warn remote file backup extension 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202202_38_38_10076.tar.gz not equal with gzip
2022/05/17 09:22:21.367415 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202203_42_42_8242.tar.gz
2022/05/17 09:22:27.134402 debug finish download from 2022-05-16T19-00-08/shadow/yplusplus_prod_in_main/request_info/default_202202_38_38_10076.tar.gz
2022/05/17 09:22:27.134445 debug start operation=downloadDiffParts table=yplusplus_prod_in_main.request_info
2022/05/17 09:22:27.134674 info done diff_parts=0 duration=0s operation=downloadDiffParts
2022/05/17 09:22:27.134696 info done backup=2022-05-16T19-00-08 duration=8m17.653s operation=download_data size=1.77GiB table=yplusplus_prod_in_main.request_info
2022/05/17 09:22:27.134731 error one of Download go-routine return error: one of downloadTableData go-routine return error: handling file: /202110_11_11_3243/event_time.bin: unexpected EOF
Do not know what is happening... Notice that, the whole backup is <2GB, so it should be quite small.
But indeed it is taking minutes to download (and finally failed):
bash-5.1# LOG_LEVEL=debug clickhouse-backup download 2022-05-16T19-00-08 > /tmp/a.log 2>&1
bash-5.1# echo $?
1
tried another file, same problem
bash-5.1# LOG_LEVEL=debug clickhouse-backup download 2022-05-15T19-00-07 > /tmp/a.log 2>&1
bash-5.1# echo $?
1
---
the log:
2022/05/17 09:27:33.680814 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202203_25_25_20053.tar.gz
2022/05/17 09:27:33.952826 warn remote file backup extension 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202203_25_25_20053.tar.gz not equal with gzip
2022/05/17 09:27:41.034210 debug finish download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202203_25_25_20053.tar.gz
2022/05/17 09:27:41.034291 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202204_97491_157166_21905.tar.gz
2022/05/17 09:27:41.102117 warn remote file backup extension 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202204_97491_157166_21905.tar.gz not equal with gzip
2022/05/17 09:27:41.285526 debug finish download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202204_97491_157166_21905.tar.gz
2022/05/17 09:27:41.285578 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202108_8_8_0.tar.gz
2022/05/17 09:27:41.317110 warn remote file backup extension 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202108_8_8_0.tar.gz not equal with gzip
2022/05/17 09:27:41.350590 debug finish download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202108_8_8_0.tar.gz
2022/05/17 09:27:41.350640 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202112_17_17_28524.tar.gz
2022/05/17 09:27:41.421570 warn remote file backup extension 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202112_17_17_28524.tar.gz not equal with gzip
2022/05/17 09:27:45.976110 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:27:45.976133 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:27:45.976139 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:27:45.976144 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:27:45.976149 error can't acquire semaphore during downloadTableData: context canceled
...
2022/05/17 09:27:48.719482 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:27:48.719487 error can't acquire semaphore during downloadTableData: context canceled
2022/05/17 09:27:48.719528 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/request_info/default_202205_472829_472829_0.tar.gz
2022/05/17 09:27:48.750560 warn remote file backup extension 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/request_info/default_202205_472829_472829_0.tar.gz not equal with gzip
2022/05/17 09:27:50.115112 error can't acquire semaphore during Download: context canceled backup=2022-05-15T19-00-07 operation=download
2022/05/17 09:27:50.115195 debug start downloadTableData yplusplus_prod_op_main.keycloak_log with concurrency=2 len(table.Files[...])=20
2022/05/17 09:27:50.115277 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/keycloak_log/default_202108_5_5_3066.tar.gz
2022/05/17 09:27:50.115333 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/keycloak_log/default_202106_2_2_126.tar.gz
2022/05/17 09:27:50.161254 warn remote file backup extension 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/keycloak_log/default_202106_2_2_126.tar.gz not equal with gzip
...
2022/05/17 09:28:07.867117 debug start download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202204_28_97490_21423.tar.gz
2022/05/17 09:28:07.969864 warn remote file backup extension 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202204_28_97490_21423.tar.gz not equal with gzip
2022/05/17 09:28:13.469481 debug finish download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202204_28_97490_21423.tar.gz
2022/05/17 09:28:18.403449 debug finish download from 2022-05-15T19-00-07/shadow/yplusplus_prod_op_main/static_linker_log/default_202112_17_17_28524.tar.gz
2022/05/17 09:28:18.403498 debug start operation=downloadDiffParts table=yplusplus_prod_op_main.static_linker_log
2022/05/17 09:28:18.403610 info done diff_parts=0 duration=0s operation=downloadDiffParts
2022/05/17 09:28:18.403629 info done backup=2022-05-15T19-00-07 duration=1m44.843s operation=download_data size=266.13MiB table=yplusplus_prod_op_main.static_linker_log
2022/05/17 09:28:18.403667 error one of Download go-routine return error: one of downloadTableData go-routine return error: handling file: /202110_11_11_3243/trace_id.bin: unexpected EOF
Btw the error can't acquire semaphore during downloadTableData: context canceled
is repeated 500000 times in this experiment...
@fzyzcjy could you try with https://github.com/AlexAkulov/clickhouse-backup/releases/tag/v1.3.2 ? it shall download backup which created with 1.4.0
I just upgraded from it... seems also fails
with the same error? do you have enough free space on destination disk?
could you share
df -h
on destination server?
I will check it and reply :)
a lot of free space indeed.
bash-5.1# df -h
Filesystem Size Used Available Use% Mounted on
overlay 97.9G 41.2G 51.7G 44% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
shm 64.0M 0 64.0M 0% /dev/shm
/dev/vda1 39.1G 13.6G 23.5G 37% /etc/hosts
/dev/vda1 39.1G 13.6G 23.5G 37% /dev/termination-log
/dev/vdb1 97.9G 41.2G 51.7G 44% /etc/hostname
/dev/vdb1 97.9G 41.2G 51.7G 44% /etc/resolv.conf
/dev/longhorn/pvc-44371947-dbdb-4bf7-9379-49addf9aff9d
14.7G 2.8G 11.9G 19% /var/lib/clickhouse
tmpfs 7.8G 12.0K 7.8G 0% /run/secrets/kubernetes.io/serviceaccount
tmpfs 7.8G 0 7.8G 0% /proc/acpi
tmpfs 64.0M 0 64.0M 0% /proc/kcore
tmpfs 64.0M 0 64.0M 0% /proc/keys
tmpfs 64.0M 0 64.0M 0% /proc/timer_list
tmpfs 64.0M 0 64.0M 0% /proc/sched_debug
tmpfs 7.8G 0 7.8G 0% /proc/scsi
tmpfs 7.8G 0 7.8G 0% /sys/firmware
Observation: error every time is on different files. e.g.
the experiment before:
2022/05/17 09:22:27.134731 error one of Download go-routine return error: one of downloadTableData go-routine return error: handling file: /202110_11_11_3243/event_time.bin: unexpected EOF
try again
2022/05/19 09:47:43.831474 error one of Download go-routine return error: one of downloadTableData go-routine return error: handling file: /202202_20_20_19935/trace_id.bin: unexpected EOF
try again
2022/05/19 09:53:44.154727 error one of Download go-routine return error: one of downloadTableData go-routine return error: handling file: /202110_12_12_3242/remote_addr.bin: unexpected EOF
By the way, can we know more from the error logs? I come from Java/Kotlin/Dart/C++/Rust... world where an error usually follows with a stacktrace which is quite useful.
Could you please provide some hints on what to do? This makes the clickhouse-backup completely non-useable since it cannot restore data :(
Seem to see the cause: The downloading bandwidth was very low before (~1MB/s), so maybe something timeout (not sure). Switching to ~40MB/s bandwidth, it works.
So, the root cause slow bandwidth?
Which S3 backup do you use ? AWS S3 ? or some custom backend like minio, ceph, swift?
could you also switch
clickhouse-backup print-config
again
I need ensure we have timeout on S3 but all timeout shall pass to high level and after any error download shall stop
So, the root cause slow bandwidth?
I guess so (not sure)
Which S3 backup do you use ? AWS S3 ? or some custom backend like minio, ceph, swift?
aws-s3-compatible storage, a provider on a small cloud.
could you also switch clickhouse-backup print-config again
bash-5.1# clickhouse-backup print-config
general:
remote_storage: s3
max_file_size: 0
disable_progress_bar: true
backups_to_keep_local: 0
backups_to_keep_remote: 0
log_level: info
allow_empty_backups: false
download_concurrency: 2
upload_concurrency: 2
restore_schema_on_cluster: ""
upload_by_part: true
download_by_part: true
clickhouse:
username: default
password: ..
host: clickhouse
port: 9000
disk_mapping: {}
skip_tables:
- system.*
- tutorial.*
timeout: 5m
freeze_by_part: false
secure: false
skip_verify: false
sync_replicated_tables: false
log_sql_queries: true
config_dir: /etc/clickhouse-server/
restart_command: systemctl restart clickhouse-server
ignore_not_exists_error_during_freeze: true
tls_key: ""
tls_cert: ""
tls_ca: ""
debug: false
s3:
access_key: ..
secret_key: ..
bucket: ..
endpoint: ..
region: ..
acl: private
assume_role_arn: ""
force_path_style: false
path: clickhouse
disable_ssl: false
compression_level: 1
compression_format: gzip
sse: ""
disable_cert_verification: false
storage_class: STANDARD
concurrency: 1
part_size: 0
max_parts_count: 10000
allow_multipart_download: false
debug: false
gcs:
credentials_file: ""
credentials_json: ""
bucket: ""
path: ""
compression_level: 1
compression_format: tar
debug: false
endpoint: ""
cos:
url: ""
timeout: 2m
secret_id: ""
secret_key: ""
path: ""
compression_format: tar
compression_level: 1
debug: false
api:
listen: 0.0.0.0:7171
enable_metrics: true
enable_pprof: false
username: ""
password: ""
secure: false
certificate_file: ""
private_key_file: ""
create_integration_tables: false
allow_parallel: false
ftp:
address: ""
timeout: 2m
username: ""
password: ""
tls: false
path: ""
compression_format: tar
compression_level: 1
concurrency: 2
debug: false
sftp:
address: ""
port: 22
username: ""
password: ""
key: ""
path: ""
compression_format: tar
compression_level: 1
concurrency: 1
debug: false
azblob:
endpoint_suffix: core.windows.net
account_name: ""
account_key: ""
sas: ""
use_managed_identity: false
container: ""
path: ""
compression_level: 1
compression_format: tar
sse_key: ""
buffer_size: 0
buffer_count: 3
max_parts_count: 0
issue still relevant for you?
could you try wih
s3:
max_parts_count: 5000
in your backup config?
It works well now after the bandwidth is enlarged. Thanks for your replies :)
An error still exists, reproduced on multiple S3-like storages (including classic AWS) with a large table (~1000 partitions).
Steps to reproduce:
--diff-from-remote
I didn't see sources, but I guess there is an issue with S3 files getting (not implemented pagination?)
Logs: