Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse® clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.94k stars 467 forks source link

Pod is recreating again when S3 <send_metadata> is set to true #985

Closed HeenaBansal2009 closed 2 years ago

HeenaBansal2009 commented 2 years ago

Hi , I have edited the CLICKHOUSE_INIT_TIMEOUT value to 400 as suggested in https://kb.altinity.com/altinity-kb-kubernetes/altinity-kb-possible-issues-with-running-clickhouse-in-k8s/#clickhouse-init-process-failed but I still see that SIGTERM signal is received by Application .

I am wondering , Is any changes(timeout) is required from Clickhouse-operator side as well to support clickhouse installation when s3 disk is configured to restore metadata and send_metadata is set to true.

PS: When I am setting send_metadata to false, my pods are running fine.

UnamedRus commented 2 years ago

Did you saw the exact message: ClickHouse init process failed during startup?

HeenaBansal2009 commented 2 years ago

@UnamedRus , I saw it before by default clickhouse_init_timeout. I increased the clickhouse_init_timeout value to 400 and "I don't see the clickhouse init failed error" however, I still see application reviewed SIGTERM signal . I am not sure if Clickhouse-operator also has some timeout variables as well which needs to be set in this case.

Attaching the clickhouse logs. 2022.08.04 14:09:06.699757 [ 1 ] {} <Information> Context: Initialized background executor for fetches with num_threads=8, num_tasks=8 2022.08.04 14:09:06.700180 [ 1 ] {} <Information> Context: Initialized background executor for common operations (e.g. clearing old parts) with num_threads=8, num_tasks=8 2022.08.04 14:09:06.701177 [ 1 ] {} <Information> Application: Loading user defined objects from /var/lib/clickhouse/ 2022.08.04 14:09:06.701197 [ 1 ] {} <Debug> UserDefinedSQLObjectsLoader: loading user defined objects 2022.08.04 14:09:06.701239 [ 1 ] {} <Debug> Application: Loaded user defined objects 2022.08.04 14:09:06.701254 [ 1 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/ 2022.08.04 14:09:06.705916 [ 1 ] {} <Information> DatabaseAtomic (system): Metadata processed, database system has 7 tables and 0 dictionaries in total. 2022.08.04 14:09:06.705959 [ 1 ] {} <Information> TablesLoader: Parsed metadata of 7 tables in 1 databases in 0.004426418 sec 2022.08.04 14:09:06.705977 [ 1 ] {} <Information> TablesLoader: Loading 7 tables with 0 dependency level 2022.08.04 14:09:06.707121 [ 402 ] {} <Information> AWSClient: Aws::Config::AWSConfigFileProfileConfigLoader: Initializing config loader against fileName /.aws/credentials and using profilePrefix = 0 2022.08.04 14:09:06.707159 [ 402 ] {} <Information> AWSClient: ProfileConfigFileAWSCredentialsProvider: Setting provider to read credentials from /.aws/credentials for credentials file and /.aws/config for the config file , for use with profile default 2022.08.04 14:09:06.707254 [ 402 ] {} <Trace> DiskS3: Write to file by path:/var/lib/clickhouse/disks/s3/test_acl. S3 path: root-path/r0000000000000000000000000000000000000000000000000000000000000001-file-nznfwlsppmuabmuaiwtphczdyugxspfp 2022.08.04 14:09:06.707292 [ 402 ] {} <Trace> WriteBufferFromS3: Making single part upload. Bucket: datalake04, Key: root-path/r0000000000000000000000000000000000000000000000000000000000000001-file-nznfwlsppmuabmuaiwtphczdyugxspfp, Size: 4, WithPool: true 2022.08.04 14:09:07.072349 [ 403 ] {} <Trace> WriteBufferFromS3: Single part upload has completed. Bucket: datalake04, Key: root-path/r0000000000000000000000000000000000000000000000000000000000000001-file-nznfwlsppmuabmuaiwtphczdyugxspfp, Object size: 4, WithPool: true 2022.08.04 14:09:07.415984 [ 402 ] {} <Trace> DiskS3: Remove file by path:/var/lib/clickhouse/disks/s3/test_acl 2022.08.04 14:09:07.416138 [ 402 ] {} <Trace> DiskS3: Remove AWS keys root-path/r0000000000000000000000000000000000000000000000000000000000000001-file-nznfwlsppmuabmuaiwtphczdyugxspfp 2022.08.04 14:09:07.780289 [ 402 ] {} <Information> DiskS3: Starting up disk s3 2022.08.04 14:09:08.512297 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 1 2022.08.04 14:09:09.218789 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 01 2022.08.04 14:09:09.929662 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 001 2022.08.04 14:09:10.645359 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0001 2022.08.04 14:09:11.390834 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00001 2022.08.04 14:09:12.104965 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000001 2022.08.04 14:09:12.834746 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000001 2022.08.04 14:09:13.561863 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000001 2022.08.04 14:09:14.276694 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000001 2022.08.04 14:09:15.023568 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000001 2022.08.04 14:09:15.739865 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000001 2022.08.04 14:09:16.464369 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000001 2022.08.04 14:09:17.176718 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000001 2022.08.04 14:09:17.899056 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000001 2022.08.04 14:09:18.636737 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000001 2022.08.04 14:09:19.334149 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000001 2022.08.04 14:09:20.062556 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000001 2022.08.04 14:09:20.773479 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000001 2022.08.04 14:09:21.488898 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000001 2022.08.04 14:09:22.215736 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000001 2022.08.04 14:09:22.926561 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000001 2022.08.04 14:09:23.661639 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000001 2022.08.04 14:09:24.385823 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000001 2022.08.04 14:09:25.109344 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000001 2022.08.04 14:09:25.804244 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000001 2022.08.04 14:09:26.541922 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000001 2022.08.04 14:09:27.244366 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000001 2022.08.04 14:09:27.992477 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000001 2022.08.04 14:09:28.717954 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000001 2022.08.04 14:09:29.425186 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000001 2022.08.04 14:09:30.167526 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000001 2022.08.04 14:09:30.871998 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000001 2022.08.04 14:09:31.594572 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000001 2022.08.04 14:09:32.298440 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000001 2022.08.04 14:09:33.006527 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000001 2022.08.04 14:09:33.735917 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000001 2022.08.04 14:09:34.460518 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000001 2022.08.04 14:09:35.190876 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000001 2022.08.04 14:09:35.897703 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000001 2022.08.04 14:09:36.629395 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000001 2022.08.04 14:09:37.359316 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000001 2022.08.04 14:09:38.078758 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000001 2022.08.04 14:09:38.791716 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000001 2022.08.04 14:09:39.505928 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000000001 2022.08.04 14:09:40.213722 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000000001 2022.08.04 14:09:40.940762 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000000001 2022.08.04 14:09:41.660842 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000000000001 2022.08.04 14:09:42.119634 [ 352 ] {} <Trace> BaseDaemon: Received signal 15 2022.08.04 14:09:42.119712 [ 352 ] {} <Information> Application: Received termination signal (Terminated) 2022.08.04 14:09:42.388399 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000000000001 2022.08.04 14:09:43.099656 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000000000001 2022.08.04 14:09:43.815059 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000000000000001 2022.08.04 14:09:44.537983 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000000000000001 2022.08.04 14:09:45.248441 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000000000000001 2022.08.04 14:09:45.975781 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000000000000000001 2022.08.04 14:09:46.703637 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:47.435822 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:48.151426 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:48.881589 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:49.603249 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:50.315725 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:51.033805 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:51.743791 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:52.482366 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 00000000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:53.193671 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 000000000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:53.915076 [ 402 ] {} <Trace> DiskS3: Check object exists with revision prefix 0000000000000000000000000000000000000000000000000000000000000001 2022.08.04 14:09:54.634229 [ 402 ] {} <Information> DiskS3: Found last revision number 0 for disk s3 2022.08.04 14:09:54.634292 [ 402 ] {} <Information> DiskS3: Disk s3 started up 2022.08.04 14:09:54.634537 [ 402 ] {} <Information> StoragePolicySelector: Storage policytieredloaded 2022.08.04 14:09:54.634593 [ 402 ] {} <Information> StoragePolicySelector: Storage policys3onlyloaded 2022.08.04 14:09:54.634651 [ 402 ] {} <Information> TablesLoader: 14.285714285714286% 2022.08.04 14:09:54.634736 [ 401 ] {} <Debug> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8): Loading data parts 2022.08.04 14:09:54.636458 [ 402 ] {} <Debug> system.query_log (a148a7f5-0783-42df-93ab-c510d7338b3e): Loading data parts 2022.08.04 14:09:54.637674 [ 401 ] {} <Debug> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8): Loaded data parts (8 items) 2022.08.04 14:09:54.642218 [ 401 ] {} <Debug> system.metric_log (16e6171d-0415-4419-b9c2-876ef5f32847): Loading data parts 2022.08.04 14:09:54.644567 [ 402 ] {} <Debug> system.query_log (a148a7f5-0783-42df-93ab-c510d7338b3e): Loaded data parts (9 items) 2022.08.04 14:09:54.645350 [ 402 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Loading data parts 2022.08.04 14:09:54.647167 [ 402 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Loaded data parts (6 items) 2022.08.04 14:09:54.647821 [ 402 ] {} <Debug> system.trace_log (4d8dbc48-74e9-4bb2-adb5-8d6c4988dab2): Loading data parts 2022.08.04 14:09:54.650720 [ 402 ] {} <Debug> system.trace_log (4d8dbc48-74e9-4bb2-adb5-8d6c4988dab2): Loaded data parts (8 items) 2022.08.04 14:09:54.656173 [ 401 ] {} <Debug> system.metric_log (16e6171d-0415-4419-b9c2-876ef5f32847): Loaded data parts (5 items) 2022.08.04 14:09:54.658374 [ 1 ] {} <Debug> SystemLog: Not creating system.query_thread_log since corresponding section 'query_thread_log' is missing from config 2022.08.04 14:09:54.660177 [ 1 ] {} <Debug> SystemLog: Not creating system.filesystem_cache_log since corresponding section 'filesystem_cache_log' is missing from config 2022.08.04 14:09:54.660928 [ 1 ] {} <Debug> SystemLog: Not creating system.zookeeper_log since corresponding section 'zookeeper_log' is missing from config 2022.08.04 14:09:54.660957 [ 1 ] {} <Debug> SystemLog: Not creating system.session_log since corresponding section 'session_log' is missing from config 2022.08.04 14:09:54.660971 [ 1 ] {} <Debug> SystemLog: Not creating system.transactions_info_log since corresponding section 'transactions_info_log' is missing from config 2022.08.04 14:09:54.670140 [ 1 ] {} <Information> DatabaseCatalog: Found 0 partially dropped tables. Will load them and retry removal. 2022.08.04 14:09:54.670318 [ 1 ] {} <Information> DatabaseOrdinary (default): Metadata processed, database default has 0 tables and 0 dictionaries in total. 2022.08.04 14:09:54.670338 [ 1 ] {} <Information> TablesLoader: Parsed metadata of 0 tables in 1 databases in 4.737e-05 sec 2022.08.04 14:09:54.670347 [ 1 ] {} <Information> TablesLoader: Loading 0 tables with 0 dependency level 2022.08.04 14:09:54.670371 [ 1 ] {} <Information> DatabaseOrdinary (default): Starting up tables. 2022.08.04 14:09:54.670383 [ 1 ] {} <Information> DatabaseAtomic (system): Starting up tables. 2022.08.04 14:09:54.670779 [ 414 ] {} <Information> BackgroundSchedulePool/BgSchPool: Create BackgroundSchedulePool with 128 threads 2022.08.04 14:09:54.677600 [ 423 ] {} <Debug> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8) (MergerMutator): Selected 2 parts from 202208_10501_10947_201 to 202208_10948_10948_0 2022.08.04 14:09:54.677679 [ 423 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on diskdefault, having unreserved 19.41 GiB. 2022.08.04 14:09:54.679750 [ 374 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Debug> MergeTask::PrepareStage: Merging 2 parts: from 202208_10501_10947_201 to 202208_10948_10948_0 into Wide 2022.08.04 14:09:54.680642 [ 414 ] {} <Debug> StorageDistributed (parts_dist): Auto-increment is 0 2022.08.04 14:09:54.680677 [ 374 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Debug> MergeTask::PrepareStage: Selected MergeAlgorithm: Horizontal 2022.08.04 14:09:54.680902 [ 374 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Debug> MergeTreeSequentialSource: Reading 156 marks from part 202208_10501_10947_201, total 1264842 rows starting from the beginning of the part 2022.08.04 14:09:54.682231 [ 426 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9) (MergerMutator): Selected 6 parts from 20220804_12_56_11 to 20220804_61_61_0 2022.08.04 14:09:54.682294 [ 426 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on diskdefault, having unreserved 19.41 GiB. 2022.08.04 14:09:54.683165 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTask::PrepareStage: Merging 6 parts: from 20220804_12_56_11 to 20220804_61_61_0 into Compact 2022.08.04 14:09:54.683271 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTask::PrepareStage: Selected MergeAlgorithm: Horizontal 2022.08.04 14:09:54.683320 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 20220804_12_56_11, total 3046 rows starting from the beginning of the part 2022.08.04 14:09:54.683380 [ 374 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 202208_10948_10948_0, total 2266 rows starting from the beginning of the part 2022.08.04 14:09:54.683481 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 20220804_57_57_0, total 171 rows starting from the beginning of the part 2022.08.04 14:09:54.683538 [ 415 ] {} <Debug> StorageDistributed (query_log_dist): Auto-increment is 0 2022.08.04 14:09:54.683619 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 20220804_58_58_0, total 43 rows starting from the beginning of the part 2022.08.04 14:09:54.683716 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 20220804_59_59_0, total 172 rows starting from the beginning of the part 2022.08.04 14:09:54.683821 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 20220804_60_60_0, total 43 rows starting from the beginning of the part 2022.08.04 14:09:54.683938 [ 360 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 20220804_61_61_0, total 141 rows starting from the beginning of the part 2022.08.04 14:09:54.684711 [ 1 ] {} <Debug> Application: Loaded metadata. 2022.08.04 14:09:54.684772 [ 1 ] {} <Trace> Pipe: Pipe capacity is 1.00 MiB 2022.08.04 14:09:54.684821 [ 1 ] {} <Debug> Application: DNS caching disabled 2022.08.04 14:09:54.684832 [ 1 ] {} <Information> Application: Tasks stats provider: procfs 2022.08.04 14:09:54.684859 [ 1 ] {} <Information> Application: It looks like the process has no CAP_SYS_NICE capability, the setting 'os_thread_priority' will have no effect. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_sys_nice=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems. 2022.08.04 14:09:54.688574 [ 368 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Debug> MergeTask::MergeProjectionsStage: Merge sorted 3616 rows, containing 13 columns (13 merged, 0 gathered) in 0.005744397 sec., 629482.9553040989 rows/sec., 89.30 MiB/sec. 2022.08.04 14:09:54.689750 [ 368 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Trace> MergedBlockOutputStream: filled checksums 20220804_12_61_12 (state Temporary) 2022.08.04 14:09:54.690106 [ 368 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Trace> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Renaming temporary part tmp_merge_20220804_12_61_12 to 20220804_12_61_12. 2022.08.04 14:09:54.690213 [ 368 ] {f24baab0-0eea-4ebc-bce9-ab001282e9f9::20220804_12_61_12} <Trace> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9) (MergerMutator): Merged 6 parts: from 20220804_12_56_11 to 20220804_61_61_0 2022.08.04 14:09:54.696687 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:8123 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:8123 (version 22.5.1.2079). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> 2022.08.04 14:09:54.696889 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:8443 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:8443 (version 22.5.1.2079). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> 2022.08.04 14:09:54.697038 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:9440 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:9440 (version 22.5.1.2079). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> 2022.08.04 14:09:54.697184 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:9010 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:9010 (version 22.5.1.2079). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> 2022.08.04 14:09:54.697299 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:9004 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:9004 (version 22.5.1.2079). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> 2022.08.04 14:09:54.697410 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:9005 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:9005 (version 22.5.1.2079). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> 2022.08.04 14:09:54.697512 [ 1 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 31.04 MiB, peak 31.04 MiB, will set to 538.48 MiB (RSS), difference: 507.44 MiB 2022.08.04 14:09:54.698880 [ 1 ] {} <Information> Application: Available RAM: 18.00 GiB; physical cores: 2; logical cores: 16. 2022.08.04 14:09:54.699800 [ 1 ] {} <Information> Application: Listening for http://[::]:8123 2022.08.04 14:09:54.699808 [ 368 ] {} <Debug> MemoryTracker: Peak memory usage Mutate/Merge: 4.00 MiB. 2022.08.04 14:09:54.699856 [ 547 ] {} <Debug> DDLWorker: Starting DDLWorker thread 2022.08.04 14:09:54.699992 [ 547 ] {} <Debug> DDLWorker: Initializing DDLWorker thread 2022.08.04 14:09:54.699983 [ 548 ] {} <Debug> DDLWorker: Started DDLWorker cleanup thread 2022.08.04 14:09:54.700023 [ 1 ] {} <Information> Application: Listening for https://[::]:8443 2022.08.04 14:09:54.700113 [ 1 ] {} <Information> Application: Listening for secure native protocol (tcp_secure): [::]:9440 2022.08.04 14:09:54.700186 [ 1 ] {} <Information> Application: Listening for secure replica communication (interserver): https://[::]:9010 2022.08.04 14:09:54.700261 [ 1 ] {} <Information> Application: Listening for MySQL compatibility protocol: [::]:9004 2022.08.04 14:09:54.700364 [ 1 ] {} <Information> Application: Listening for PostgreSQL compatibility protocol: [::]:9005 2022.08.04 14:09:54.700380 [ 1 ] {} <Information> Application: Ready for connections. 2022.08.04 14:09:54.700463 [ 1 ] {} <Debug> Application: Received termination signal. 2022.08.04 14:09:54.700477 [ 1 ] {} <Debug> Application: Waiting for current connections to close. 2022.08.04 14:09:54.704193 [ 547 ] {} <Trace> ZooKeeper: Initialized, hosts: zookeeper:2181 2022.08.04 14:09:54.709482 [ 547 ] {} <Debug> DDLWorker: Initialized DDLWorker thread 2022.08.04 14:09:54.709531 [ 547 ] {} <Debug> DDLWorker: Scheduling tasks 2022.08.04 14:09:54.709549 [ 548 ] {} <Debug> DDLWorker: Cleaning queue 2022.08.04 14:09:54.709553 [ 547 ] {} <Trace> DDLWorker: Don't have unfinished tasks after restarting 2022.08.04 14:09:54.710101 [ 547 ] {} <Trace> DDLWorker: scheduleTasks: initialized=true, size_before_filtering=0, queue_size=0, entries=none..none, first_failed_task_name=none, current_tasks_size=0, last_current_task=none, last_skipped_entry_name=none 2022.08.04 14:09:54.710124 [ 547 ] {} <Debug> DDLWorker: No tasks to schedule 2022.08.04 14:09:54.710132 [ 547 ] {} <Debug> DDLWorker: Waiting for queue updates 2022.08.04 14:09:54.864151 [ 360 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Debug> MergeTask::MergeProjectionsStage: Merge sorted 1267108 rows, containing 4 columns (4 merged, 0 gathered) in 0.185977498 sec., 6813232.856805075 rows/sec., 110.19 MiB/sec. 2022.08.04 14:09:54.865528 [ 360 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Trace> MergedBlockOutputStream: filled checksums 202208_10501_10948_202 (state Temporary) 2022.08.04 14:09:54.865913 [ 360 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Trace> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8): Renaming temporary part tmp_merge_202208_10501_10948_202 to 202208_10501_10948_202. 2022.08.04 14:09:54.866022 [ 360 ] {940f45d6-8d7d-421c-8f7c-a57c91c9cde8::202208_10501_10948_202} <Trace> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8) (MergerMutator): Merged 2 parts: from 202208_10501_10947_201 to 202208_10948_10948_0 2022.08.04 14:09:54.866080 [ 360 ] {} <Debug> MemoryTracker: Peak memory usage Mutate/Merge: 12.99 MiB. 2022.08.04 14:09:55.000136 [ 525 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 525.55 MiB, peak 538.49 MiB, will set to 548.29 MiB (RSS), difference: 22.75 MiB 2022.08.04 14:09:55.451349 [ 1 ] {} <Information> Application: Closed all listening sockets. 2022.08.04 14:09:55.451434 [ 1 ] {} <Information> Application: Closed connections. 2022.08.04 14:09:55.452068 [ 1 ] {} <Information> Application: Shutting down storages. 2022.08.04 14:09:55.452097 [ 1 ] {} <Information> Context: Shutdown disk backups 2022.08.04 14:09:55.452106 [ 1 ] {} <Information> Context: Shutdown disk default 2022.08.04 14:09:55.452116 [ 1 ] {} <Information> Context: Shutdown disk s3 2022.08.04 14:09:55.452272 [ 407 ] {} <Trace> SystemLog (system.query_log): Terminating 2022.08.04 14:09:55.452358 [ 406 ] {} <Trace> SystemLog (system.part_log): Terminating 2022.08.04 14:09:55.452419 [ 401 ] {} <Trace> SystemLog (system.trace_log): Flushing system log, 13 entries to flush up to offset 13 2022.08.04 14:09:55.452790 [ 401 ] {} <Debug> SystemLog (system.trace_log): Will use existing table system.trace_log for TraceLog 2022.08.04 14:09:55.453370 [ 401 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on diskdefault, having unreserved 19.41 GiB. 2022.08.04 14:09:55.454556 [ 401 ] {} <Trace> MergedBlockOutputStream: filled checksums 202208_1_1_0 (state Temporary) 2022.08.04 14:09:55.455359 [ 401 ] {} <Trace> system.trace_log (4d8dbc48-74e9-4bb2-adb5-8d6c4988dab2): Renaming temporary part tmp_insert_202208_1_1_0 to 202208_10137_10137_0. 2022.08.04 14:09:55.455545 [ 401 ] {} <Trace> SystemLog (system.trace_log): Flushed system log up to offset 13 2022.08.04 14:09:55.455564 [ 401 ] {} <Trace> SystemLog (system.trace_log): Terminating 2022.08.04 14:09:55.455620 [ 408 ] {} <Trace> SystemLog (system.crash_log): Terminating 2022.08.04 14:09:55.456625 [ 402 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on diskdefault, having unreserved 19.41 GiB. 2022.08.04 14:09:55.457079 [ 402 ] {} <Trace> MergedBlockOutputStream: filled checksums 20220804_1_1_0 (state Temporary) 2022.08.04 14:09:55.457717 [ 402 ] {} <Trace> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Renaming temporary part tmp_insert_20220804_1_1_0 to 20220804_62_62_0. 2022.08.04 14:09:55.457913 [ 1 ] {} <Trace> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Found 6 old parts to remove. 2022.08.04 14:09:55.457945 [ 1 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Removing part from filesystem 20220804_12_56_11 2022.08.04 14:09:55.458232 [ 1 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Removing part from filesystem 20220804_57_57_0 2022.08.04 14:09:55.458445 [ 1 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Removing part from filesystem 20220804_58_58_0 2022.08.04 14:09:55.458724 [ 1 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Removing part from filesystem 20220804_59_59_0 2022.08.04 14:09:55.458975 [ 1 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Removing part from filesystem 20220804_60_60_0 2022.08.04 14:09:55.459213 [ 1 ] {} <Debug> system.text_log (f24baab0-0eea-4ebc-bce9-ab001282e9f9): Removing part from filesystem 20220804_61_61_0 2022.08.04 14:09:55.661830 [ 411 ] {} <Trace> SystemLog (system.metric_log): Flushing system log, 1 entries to flush up to offset 1 2022.08.04 14:09:55.664120 [ 411 ] {} <Debug> SystemLog (system.metric_log): Will use existing table system.metric_log for MetricLog 2022.08.04 14:09:55.672254 [ 411 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on diskdefault, having unreserved 19.41 GiB. 2022.08.04 14:09:55.674453 [ 411 ] {} <Trace> MergedBlockOutputStream: filled checksums 202208_1_1_0 (state Temporary) 2022.08.04 14:09:55.675010 [ 411 ] {} <Trace> system.metric_log (16e6171d-0415-4419-b9c2-876ef5f32847): Renaming temporary part tmp_insert_202208_1_1_0 to 202208_10221_10221_0. 2022.08.04 14:09:55.676074 [ 411 ] {} <Trace> SystemLog (system.metric_log): Flushed system log up to offset 1 2022.08.04 14:09:55.676103 [ 411 ] {} <Trace> SystemLog (system.metric_log): Terminating 2022.08.04 14:09:55.676176 [ 412 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushing system log, 538 entries to flush up to offset 538 2022.08.04 14:09:55.676594 [ 412 ] {} <Debug> SystemLog (system.asynchronous_metric_log): Will use existing table system.asynchronous_metric_log for AsynchronousMetricLog 2022.08.04 14:09:55.677248 [ 412 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on diskdefault, having unreserved 19.41 GiB. 2022.08.04 14:09:55.677830 [ 412 ] {} <Trace> MergedBlockOutputStream: filled checksums 202208_1_1_0 (state Temporary) 2022.08.04 14:09:55.678457 [ 412 ] {} <Trace> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8): Renaming temporary part tmp_insert_202208_1_1_0 to 202208_10953_10953_0. 2022.08.04 14:09:55.678629 [ 412 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushed system log up to offset 538 2022.08.04 14:09:55.678648 [ 412 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Terminating 2022.08.04 14:09:55.678667 [ 455 ] {} <Debug> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8) (MergerMutator): Selected 2 parts from 202208_10501_10948_202 to 202208_10949_10949_0 2022.08.04 14:09:55.678719 [ 455 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on diskdefault, having unreserved 19.41 GiB. 2022.08.04 14:09:55.678992 [ 367 ] {} <Information> MergeTreeBackgroundExecutor: Code: 236. DB::Exception: Cancelled merging parts. (ABORTED) (version 22.5.1.2079) 2022.08.04 14:09:55.679060 [ 1 ] {} <Trace> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8): Found 2 old parts to remove. 2022.08.04 14:09:55.679091 [ 1 ] {} <Debug> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8): Removing part from filesystem 202208_10501_10947_201 2022.08.04 14:09:55.679655 [ 1 ] {} <Debug> system.asynchronous_metric_log (940f45d6-8d7d-421c-8f7c-a57c91c9cde8): Removing part from filesystem 202208_10948_10948_0 2022.08.04 14:09:55.679950 [ 409 ] {} <Trace> SystemLog (system.opentelemetry_span_log): Terminating 2022.08.04 14:09:55.680035 [ 405 ] {} <Trace> SystemLog (system.query_views_log): Terminating 2022.08.04 14:09:55.680119 [ 413 ] {} <Trace> SystemLog (system.processors_profile_log): Terminating 2022.08.04 14:09:55.680903 [ 1 ] {} <Debug> StorageDistributed (parts_dist): Joining background threads for async INSERT 2022.08.04 14:09:55.680932 [ 1 ] {} <Debug> StorageDistributed (parts_dist): Background threads for async INSERT joined 2022.08.04 14:09:55.680942 [ 1 ] {} <Debug> StorageDistributed (query_log_dist): Joining background threads for async INSERT 2022.08.04 14:09:55.680950 [ 1 ] {} <Debug> StorageDistributed (query_log_dist): Background threads for async INSERT joined 2022.08.04 14:09:55.684372 [ 1 ] {} <Trace> BackgroundSchedulePool/BgSchPool: Waiting for threads to finish. 2022.08.04 14:09:55.685362 [ 1 ] {} <Debug> MemoryTracker: Peak memory usage (for query): 75.98 KiB. 2022.08.04 14:09:55.685449 [ 1 ] {} <Debug> Application: Shut down storages. 2022.08.04 14:09:55.721700 [ 1 ] {} <Debug> Application: Destroyed global context. 2022.08.04 14:09:55.722625 [ 1 ] {} <Information> Application: shutting down 2022.08.04 14:09:55.722651 [ 1 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem 2022.08.04 14:09:55.722684 [ 352 ] {} <Trace> BaseDaemon: Received signal -2 2022.08.04 14:09:55.722723 [ 352 ] {} <Information> BaseDaemon: Stop SignalListener thread

HeenaBansal2009 commented 2 years ago

After investigation , I found the root cause. The less value for liveness initdelayedSeconds make it to fail . Changing this value equal to CLICKHOUSE_INIT_TIMEOUT worked.

I am just wondering , if this can be changed dynamically according to CLICKHOUSE_INIT_TIMEOUT, Because CLICKHOUSE_INIT_TIMEOUT will loose its expected behaviour when container has liveliness probing enabled ?

UnamedRus commented 2 years ago

I don't suggest to use CLICKHOUSE_INIT_TIMEOUT (and put anything in /init.db ) at all tbh.

What do you put here?

HeenaBansal2009 commented 2 years ago

no I didn't put anything in /init.db, I changed the clickhouseInstallation yaml for Clickhouse-operator like this :

`name: clickhouse
              livenessProbe:
                failureThreshold: 3
                httpGet:
                  path: /ping
                  port: 8443
                  scheme: HTTPS
                initialDelaySeconds: 120.  ---> which was 10 before and because of this operator was waiting for initial 10 seconds and started application probing within interval of 1 sec and once it is reaches failureThreshold , operator was restarting the pod . Keeping CLICKHUSE_INIT_TImeout = 120 alone keeping initialDelaySeconds value unchanged didn't work when send_metadata is set to true because clickhouse-server takes approx 2 min to come up. 
                periodSeconds: 1
                successThreshold: 1
                timeoutSeconds: 5
              env:
                - name: DATALAKE_CLUSTER
                  value: datalake
                **- name: CLICKHOUSE_INIT_TIMEOUT
                  value: '120'`**