Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse® clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.89k stars 462 forks source link

clickhouse keeper volumeClaimTemplates settings problem result in DB::Exception: Invalid changelog file lost+found/. (LOGICAL_ERROR) #1329

Closed hueiyuan closed 8 months ago

hueiyuan commented 8 months ago

Bug Description

Based on this 3-node keeper manifest example, which lacks of templates this key so that can not be applied.

hueiyuan commented 8 months ago

settings of configuration this line also lacks of configuration key.

hueiyuan commented 8 months ago

@Slach @antip00 But keeper seems that can not specify volumeClaimTemplates. When we specify it and always error.

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: clickhouse-keeper
spec:
  replicas: 1
  configuration:
    settings:
      logger/level: "trace"
      logger/console: "true"
      listen_host: "0.0.0.0"
      keeper_server/storage_path: /var/lib/clickhouse-keeper
      keeper_server/tcp_port: "2181"
      keeper_server/four_letter_word_white_list: "*"
      keeper_server/coordination_settings/raft_logs_level: "information"
      keeper_server/raft_configuration/server/port: "9444"
      prometheus/endpoint: "/metrics"
      prometheus/port: "7000"
      prometheus/metrics: "true"
      prometheus/events: "true"
      prometheus/asynchronous_metrics: "true"
      prometheus/status_info: "false"
  templates:
    podTemplates:
      - name: clickhouse-keeper-pod
        metadata:
          labels:
            app: clickhouse-keeper
        spec:
          containers:
            - name: clickhouse-keeper
              imagePullPolicy: IfNotPresent
              image: "clickhouse/clickhouse-keeper:23.8.9-alpine"
              resources:
                requests:
                  memory: "256M"
                  cpu: "1"
                limits:
                  memory: "2Gi"
                  cpu: "2"
              volumeMounts:
                - name: both-paths
                  mountPath: /var/lib/clickhouse-keeper
    volumeClaimTemplates:
      - name: t1
        metadata:
          name: both-paths
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 25Gi
hueiyuan commented 8 months ago

@Slach so does current keeper (0.23.0) support VolumeClaimTemplates?

zeev1079 commented 8 months ago

the following revision should work:

apiVersion: clickhouse-keeper.altinity.com/v1
kind: ClickHouseKeeperInstallation
metadata:
  name: clickhouse-keeper
spec:
  configuration:
    clusters:
      - name: "keeper-3"
        layout:
          replicasCount: 3
    settings:
      logger/level: "trace"
      logger/console: "true"
      listen_host: "0.0.0.0"
      keeper_server/storage_path: /var/lib/clickhouse-keeper
      keeper_server/tcp_port: "2181"
      keeper_server/four_letter_word_white_list: "*"
      keeper_server/coordination_settings/raft_logs_level: "information"
      keeper_server/raft_configuration/server/port: "9444"
      prometheus/endpoint: "/metrics"
      prometheus/port: "7000"
      prometheus/metrics: "true"
      prometheus/events: "true"
      prometheus/asynchronous_metrics: "true"
      prometheus/status_info: "false"
  templates:
    podTemplates:
      - name: clickhouse-keeper-pod
        metadata:
          annotations:
            prometheus.io/scrape: "true"
          labels:
            app: clickhouse-keeper
            what: node
        spec:
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 50
                  podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                        - key: "app"
                          operator: In
                          values:
                            - clickhouse-keeper
                    topologyKey: "kubernetes.io/hostname"
          containers:
            - name: clickhouse-keeper
              imagePullPolicy: IfNotPresent
              image: clickhouse/clickhouse-keeper:head-alpine
              resources:
                requests:
                  memory: "256M"
                  cpu: "1"
                limits:
                  memory: "4Gi"
                  cpu: "2"
    volumeClaimTemplates:
      - name: log-storage-path
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
      - name: snapshot-storage-path
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
hueiyuan commented 8 months ago

@zeev1079 @Slach @alex-zaitsev Could you assist to check it? But I always get the below error so that pod is crash loop back off.. The all of error log information:

clickhouse-keeper 2024.02.13 01:50:15.244153 [ 1 ] {} <Error> DB::Changelog::~Changelog(): Code: 49. DB::Exception: Changelog must be initialized before flushing records │
│ . (LOGICAL_ERROR), Stack trace (when copying this message, always include the lines below):                                                                               │
│ clickhouse-keeper                                                                                                                                                         │
│ clickhouse-keeper 0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000000e30fdb in /usr/bin/clickhouse-keeper                               │
│ clickhouse-keeper 1. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x00000000007886cd in /usr/bin/clickhouse-keeper                                         │
│ clickhouse-keeper 2. DB::Changelog::flushAsync() @ 0x000000000078d94d in /usr/bin/clickhouse-keeper                                                                       │
│ clickhouse-keeper 3. DB::Changelog::flush() @ 0x000000000078d356 in /usr/bin/clickhouse-keeper                                                                            │
│ clickhouse-keeper 4. DB::Changelog::~Changelog() @ 0x000000000078e195 in /usr/bin/clickhouse-keeper                                                                       │
│ clickhouse-keeper 5. DB::KeeperLogStore::~KeeperLogStore() @ 0x00000000007d7c74 in /usr/bin/clickhouse-keeper                                                             │
│ clickhouse-keeper 6. DB::KeeperStateManager::~KeeperStateManager() @ 0x00000000008486af in /usr/bin/clickhouse-keeper                                                     │
│ clickhouse-keeper 7. DB::KeeperServer::~KeeperServer() @ 0x00000000007c802f in /usr/bin/clickhouse-keeper                                                                 │
│ clickhouse-keeper 8. DB::KeeperDispatcher::~KeeperDispatcher() @ 0x00000000007c33c2 in /usr/bin/clickhouse-keeper                                                         │
│ clickhouse-keeper 9. DB::ContextSharedPart::~ContextSharedPart() @ 0x0000000000a4c761 in /usr/bin/clickhouse-keeper                                                       │
│ clickhouse-keeper 10. DB::SharedContextHolder::~SharedContextHolder() @ 0x0000000000a44738 in /usr/bin/clickhouse-keeper                                                  │
│ clickhouse-keeper 11. DB::Keeper::main(std::vector<String, std::allocator<String>> const&) @ 0x0000000000b7307c in /usr/bin/clickhouse-keeper                             │
│ clickhouse-keeper 12. Poco::Util::Application::run() @ 0x0000000000feadc6 in /usr/bin/clickhouse-keeper                                                                   │
│ clickhouse-keeper 13. DB::Keeper::run() @ 0x0000000000b6ca5d in /usr/bin/clickhouse-keeper                                                                                │
│ clickhouse-keeper 14. Poco::Util::ServerApplication::run(int, char**) @ 0x0000000000ff3f99 in /usr/bin/clickhouse-keeper                                                  │
│ clickhouse-keeper 15. mainEntryClickHouseKeeper(int, char**) @ 0x0000000000b6b9d8 in /usr/bin/clickhouse-keeper                                                           │
│ clickhouse-keeper 16. main @ 0x0000000000b7c1bd in /usr/bin/clickhouse-keeper                                                                                             │
│ clickhouse-keeper  (version 23.12.4.15 (official build))                                                                                                                  │
│ clickhouse-keeper 2024.02.13 01:50:15.244479 [ 1 ] {} <Information> Application: Waiting for background threads                                                           │
│ clickhouse-keeper 2024.02.13 01:50:15.244678 [ 1 ] {} <Information> Application: Background threads finished in 0 ms                                                      │
│ clickhouse-keeper 2024.02.13 01:50:15.244718 [ 1 ] {} <Error> Application: Code: 76. DB::ErrnoException: Cannot open file /var/lib/clickhouse-keeper/coordination/logs/ch │
│ angelog_1_100000.bin: , errno: 13, strerror: 0. (CANNOT_OPEN_FILE), Stack trace (when copying this message, always include the lines below):                              │
│ clickhouse-keeper

Current uses version:

clickhouse-operator: 0.23.0 clickhouse-keeper: 23.8.9-alpine

And related manifest yaml is the same as the manifest yaml example of previous comment.

hueiyuan commented 8 months ago

@zeev1079 @Slach After some experiment, I found that if we specify keeper_server/storage_path as /var/lib/clickhouse instead of /var/lib/clickhouse-keeper. That is work.

Only these settings can work:

keeper_server/log_storage_path: /var/lib/clickhouse/coordination/log
keeper_server/snapshot_storage_path: /var/lib/clickhouse/coordination/snapshots
keeper_server/storage_path: /var/lib/clickhouse_keeper

If change log_storage_path and snapshot_storage_path path to /var/lib/clickhouse_keeper or not specify, which can not work and keep crash loop and show previous error.

So I begin to doubt the settings of source code...

rauanmayemir commented 8 months ago

This will also fail if you're trying to run the workload in non-privileged mode (read-only root fs, non-privileged user, disabled privilege escalation, etc). I'm stuck with:

<Error> CertificateReloader: Poco::Exception. Code: 1000, e.code() = 0, SSL context exception: Error loading private key from file /etc/clickhouse-keeper/server.key: error:02000002:system library:OPENSSL_internal:No such file or directory (version 23.10.1.1284 (official build))
Slach commented 8 months ago

@hueiyuan could you share your CHIK manifest as is?

kubectl get chik -n <namespace> <chik-name> -o yaml
Slach commented 8 months ago

@rauanmayemir could you share your CHIK manifest as is?

kubectl get chik -n <namespace> <chik-name> -o yaml
hueiyuan commented 8 months ago

@hueiyuan could you share your CHIK manifest as is?

kubectl get chik -n <namespace> <chik-name> -o yaml

@Slach But this command seems can not work, which shows do not have this resource type.

error: the server doesn't have a resource type "chik"
Slach commented 8 months ago

@hueiyuan sorry,

kubectl get chk -n <namespace> <chk-name> -o yaml
hueiyuan commented 8 months ago

@Slach yaml output follow is (This manifest has previous mentioned error):

apiVersion: clickhouse-keeper.altinity.com/v1
kind: ClickHouseKeeperInstallation
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"clickhouse-keeper.altinity.com/v1","kind":"ClickHouseKeeperInstallation","metadata":{"annotations":{},"name":"clickhouse-keeper","namespace":"ck-job"},"spec":{"configuration":{"clusters":[{"layout":{"replicasCount":1},"name":"keeper-3"}],"settings":{"keeper_server/coordination_settings/raft_logs_level":"information","keeper_server/four_letter_word_white_list":"*","keeper_server/raft_configuration/server/port":"9444","keeper_server/storage_path":"/var/lib/clickhouse-keeper","keeper_server/tcp_port":"2181","listen_host":"0.0.0.0","logger/console":"true","logger/level":"trace","prometheus/asynchronous_metrics":"true","prometheus/endpoint":"/metrics","prometheus/events":"true","prometheus/metrics":"true","prometheus/port":"7000","prometheus/status_info":"false"}},"templates":{"podTemplates":[{"metadata":{"annotations":{"prometheus.io/scrape":"true"},"labels":{"app":"clickhouse-keeper","what":"node"}},"name":"clickhouse-keeper-pod","spec":{"affinity":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["clickhouse-keeper"]}]},"topologyKey":"kubernetes.io/hostname"},"weight":50}]}},"containers":[{"image":"clickhouse/clickhouse-keeper:23.12.4.15-alpine","imagePullPolicy":"IfNotPresent","name":"clickhouse-keeper","resources":{"limits":{"cpu":"2","memory":"2Gi"},"requests":{"cpu":"1","memory":"1Gi"}}}]}}],"volumeClaimTemplates":[{"name":"log-storage-path","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}}}},{"name":"snapshot-storage-path","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}}}}]}}}
  creationTimestamp: "2024-02-13T11:43:54Z"
  generation: 1
  name: clickhouse-keeper
  namespace: ck-job
  resourceVersion: "27253048"
  uid: d4d12adf-e6f1-496e-a202-b81efd779d5e
spec:
  configuration:
    clusters:
    - layout:
        replicasCount: 1
      name: keeper-3
    settings:
      keeper_server/coordination_settings/raft_logs_level: information
      keeper_server/four_letter_word_white_list: '*'
      keeper_server/raft_configuration/server/port: "9444"
      keeper_server/storage_path: /var/lib/clickhouse-keeper
      keeper_server/tcp_port: "2181"
      listen_host: 0.0.0.0
      logger/console: "true"
      logger/level: trace
      prometheus/asynchronous_metrics: "true"
      prometheus/endpoint: /metrics
      prometheus/events: "true"
      prometheus/metrics: "true"
      prometheus/port: "7000"
      prometheus/status_info: "false"
  templates:
    podTemplates:
    - metadata:
        annotations:
          prometheus.io/scrape: "true"
        labels:
          app: clickhouse-keeper
          what: node
      name: clickhouse-keeper-pod
      spec:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - clickhouse-keeper
                topologyKey: kubernetes.io/hostname
              weight: 50
        containers:
        - image: clickhouse/clickhouse-keeper:23.12.4.15-alpine
          imagePullPolicy: IfNotPresent
          name: clickhouse-keeper
          resources:
            limits:
              cpu: "2"
              memory: 2Gi
            requests:
              cpu: "1"
              memory: 1Gi
    volumeClaimTemplates:
    - name: log-storage-path
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
    - name: snapshot-storage-path
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
status:
  normalizedCompleted:
    apiVersion: clickhouse-keeper.altinity.com/v1
    kind: ClickHouseKeeperInstallation
    metadata:
      creationTimestamp: "2024-02-13T11:43:54Z"
      generation: 1
      name: clickhouse-keeper
      namespace: ck-job
      resourceVersion: "27253047"
      uid: d4d12adf-e6f1-496e-a202-b81efd779d5e
    spec:
      configuration:
        clusters:
        - layout:
            replicasCount: 1
          name: keeper-3
        settings:
          keeper_server/coordination_settings/min_session_timeout_ms: "10000"
          keeper_server/coordination_settings/operation_timeout_ms: "10000"
          keeper_server/coordination_settings/raft_logs_level: information
          keeper_server/coordination_settings/session_timeout_ms: "100000"
          keeper_server/four_letter_word_white_list: '*'
          keeper_server/hostname_checks_enabled: "true"
          keeper_server/log_storage_path: /var/lib/clickhouse-keeper/coordination/logs
          keeper_server/raft_configuration/server/port: "9444"
          keeper_server/snapshot_storage_path: /var/lib/clickhouse-keeper/coordination/snapshots
          keeper_server/storage_path: /var/lib/clickhouse-keeper
          keeper_server/tcp_port: "2181"
          listen_host: 0.0.0.0
          logger/console: "true"
          logger/level: trace
          max_connections: "4096"
          openSSL/server/cacheSessions: "true"
          openSSL/server/certificateFile: /etc/clickhouse-keeper/server.crt
          openSSL/server/dhParamsFile: /etc/clickhouse-keeper/dhparam.pem
          openSSL/server/disableProtocols: sslv2,sslv3
          openSSL/server/loadDefaultCAFile: "true"
          openSSL/server/preferServerCiphers: "true"
          openSSL/server/privateKeyFile: /etc/clickhouse-keeper/server.key
          openSSL/server/verificationMode: none
          prometheus/asynchronous_metrics: "true"
          prometheus/endpoint: /metrics
          prometheus/events: "true"
          prometheus/metrics: "true"
          prometheus/port: "7000"
          prometheus/status_info: "false"
      templates:
        PodTemplatesIndex: {}
        VolumeClaimTemplatesIndex: {}
        podTemplates:
        - metadata:
            annotations:
              prometheus.io/scrape: "true"
            creationTimestamp: null
            labels:
              app: clickhouse-keeper
              what: node
          name: clickhouse-keeper-pod
          spec:
            affinity:
              podAntiAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                - podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                      - key: app
                        operator: In
                        values:
                        - clickhouse-keeper
                    topologyKey: kubernetes.io/hostname
                  weight: 50
            containers:
            - image: clickhouse/clickhouse-keeper:23.12.4.15-alpine
              imagePullPolicy: IfNotPresent
              name: clickhouse-keeper
              resources:
                limits:
                  cpu: "2"
                  memory: 2Gi
                requests:
                  cpu: "1"
                  memory: 1Gi
          zone: {}
        volumeClaimTemplates:
        - metadata:
            creationTimestamp: null
          name: log-storage-path
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 10Gi
        - metadata:
            creationTimestamp: null
          name: snapshot-storage-path
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 10Gi
  replicas: 1
  status: In progress
Slach commented 8 months ago

I see status: In progress

Could you share?

kubectl describe chk -n ck-job clickhouse-keeper

I interest Event section

hueiyuan commented 8 months ago

@Slach Sure, related information:

Name:         clickhouse-keeper
Namespace:    ck-job
Labels:       <none>
Annotations:  <none>
API Version:  clickhouse-keeper.altinity.com/v1
Kind:         ClickHouseKeeperInstallation
Metadata:
  Creation Timestamp:  2024-02-13T11:43:54Z
  Generation:          1
  Managed Fields:
    API Version:  clickhouse-keeper.altinity.com/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:configuration:
          .:
          f:clusters:
          f:settings:
            .:
            f:keeper_server/coordination_settings/raft_logs_level:
            f:keeper_server/four_letter_word_white_list:
            f:keeper_server/raft_configuration/server/port:
            f:keeper_server/storage_path:
            f:keeper_server/tcp_port:
            f:listen_host:
            f:logger/console:
            f:logger/level:
            f:prometheus/asynchronous_metrics:
            f:prometheus/endpoint:
            f:prometheus/events:
            f:prometheus/metrics:
            f:prometheus/port:
            f:prometheus/status_info:
        f:templates:
          .:
          f:podTemplates:
          f:volumeClaimTemplates:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2024-02-13T11:43:54Z
    API Version:  clickhouse-keeper.altinity.com/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:normalizedCompleted:
          .:
          f:apiVersion:
          f:kind:
          f:metadata:
            .:
            f:creationTimestamp:
            f:generation:
            f:name:
            f:namespace:
            f:resourceVersion:
            f:uid:
          f:spec:
            .:
            f:configuration:
              .:
              f:clusters:
              f:settings:
                .:
                f:keeper_server/coordination_settings/min_session_timeout_ms:
                f:keeper_server/coordination_settings/operation_timeout_ms:
                f:keeper_server/coordination_settings/raft_logs_level:
                f:keeper_server/coordination_settings/session_timeout_ms:
                f:keeper_server/four_letter_word_white_list:
                f:keeper_server/hostname_checks_enabled:
                f:keeper_server/log_storage_path:
                f:keeper_server/raft_configuration/server/port:
                f:keeper_server/snapshot_storage_path:
                f:keeper_server/storage_path:
                f:keeper_server/tcp_port:
                f:listen_host:
                f:logger/console:
                f:logger/level:
                f:max_connections:
                f:openSSL/server/cacheSessions:
                f:openSSL/server/certificateFile:
                f:openSSL/server/dhParamsFile:
                f:openSSL/server/disableProtocols:
                f:openSSL/server/loadDefaultCAFile:
                f:openSSL/server/preferServerCiphers:
                f:openSSL/server/privateKeyFile:
                f:openSSL/server/verificationMode:
                f:prometheus/asynchronous_metrics:
                f:prometheus/endpoint:
                f:prometheus/events:
                f:prometheus/metrics:
                f:prometheus/port:
                f:prometheus/status_info:
            f:templates:
              .:
              f:PodTemplatesIndex:
              f:VolumeClaimTemplatesIndex:
              f:podTemplates:
              f:volumeClaimTemplates:
        f:replicas:
        f:status:
    Manager:         clickhouse-operator
    Operation:       Update
    Subresource:     status
    Time:            2024-02-13T13:51:55Z
  Resource Version:  27454457
  UID:               d4d12adf-e6f1-496e-a202-b81efd779d5e
Spec:
  Configuration:
    Clusters:
      Layout:
        Replicas Count:  1
      Name:              keeper-3
    Settings:
      keeper_server/coordination_settings/raft_logs_level:  information
      keeper_server/four_letter_word_white_list:            *
      keeper_server/raft_configuration/server/port:         9444
      keeper_server/storage_path:                           /var/lib/clickhouse-keeper
      keeper_server/tcp_port:                               2181
      listen_host:                                          0.0.0.0
      logger/console:                                       true
      logger/level:                                         trace
      prometheus/asynchronous_metrics:                      true
      prometheus/endpoint:                                  /metrics
      prometheus/events:                                    true
      prometheus/metrics:                                   true
      prometheus/port:                                      7000
      prometheus/status_info:                               false
  Templates:
    Pod Templates:
      Metadata:
        Annotations:
          prometheus.io/scrape:  true
        Labels:
          App:   clickhouse-keeper
          What:  node
      Name:      clickhouse-keeper-pod
      Spec:
        Affinity:
          Pod Anti Affinity:
            Preferred During Scheduling Ignored During Execution:
              Pod Affinity Term:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      clickhouse-keeper
                Topology Key:  kubernetes.io/hostname
              Weight:          50
        Containers:
          Image:              clickhouse/clickhouse-keeper:23.12.4.15-alpine
          Image Pull Policy:  IfNotPresent
          Name:               clickhouse-keeper
          Resources:
            Limits:
              Cpu:     2
              Memory:  2Gi
            Requests:
              Cpu:     1
              Memory:  1Gi
    Volume Claim Templates:
      Name:  log-storage-path
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:  10Gi
      Name:           snapshot-storage-path
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:  10Gi
Status:
  Normalized Completed:
    API Version:  clickhouse-keeper.altinity.com/v1
    Kind:         ClickHouseKeeperInstallation
    Metadata:
      Creation Timestamp:  2024-02-13T11:43:54Z
      Generation:          1
      Name:                clickhouse-keeper
      Namespace:           ck-job
      Resource Version:    27454455
      UID:                 d4d12adf-e6f1-496e-a202-b81efd779d5e
    Spec:
      Configuration:
        Clusters:
          Layout:
            Replicas Count:  1
          Name:              keeper-3
        Settings:
          keeper_server/coordination_settings/min_session_timeout_ms:  10000
          keeper_server/coordination_settings/operation_timeout_ms:    10000
          keeper_server/coordination_settings/raft_logs_level:         information
          keeper_server/coordination_settings/session_timeout_ms:      100000
          keeper_server/four_letter_word_white_list:                   *
          keeper_server/hostname_checks_enabled:                       true
          keeper_server/log_storage_path:                              /var/lib/clickhouse-keeper/coordination/logs
          keeper_server/raft_configuration/server/port:                9444
          keeper_server/snapshot_storage_path:                         /var/lib/clickhouse-keeper/coordination/snapshots
          keeper_server/storage_path:                                  /var/lib/clickhouse-keeper
          keeper_server/tcp_port:                                      2181
          listen_host:                                                 0.0.0.0
          logger/console:                                              true
          logger/level:                                                trace
          max_connections:                                             4096
          openSSL/server/cacheSessions:                                true
          openSSL/server/certificateFile:                              /etc/clickhouse-keeper/server.crt
          openSSL/server/dhParamsFile:                                 /etc/clickhouse-keeper/dhparam.pem
          openSSL/server/disableProtocols:                             sslv2,sslv3
          openSSL/server/loadDefaultCAFile:                            true
          openSSL/server/preferServerCiphers:                          true
          openSSL/server/privateKeyFile:                               /etc/clickhouse-keeper/server.key
          openSSL/server/verificationMode:                             none
          prometheus/asynchronous_metrics:                             true
          prometheus/endpoint:                                         /metrics
          prometheus/events:                                           true
          prometheus/metrics:                                          true
          prometheus/port:                                             7000
          prometheus/status_info:                                      false
      Templates:
        Pod Templates Index:
        Volume Claim Templates Index:
        Pod Templates:
          Metadata:
            Annotations:
              prometheus.io/scrape:  true
            Creation Timestamp:      <nil>
            Labels:
              App:   clickhouse-keeper
              What:  node
          Name:      clickhouse-keeper-pod
          Spec:
            Affinity:
              Pod Anti Affinity:
                Preferred During Scheduling Ignored During Execution:
                  Pod Affinity Term:
                    Label Selector:
                      Match Expressions:
                        Key:       app
                        Operator:  In
                        Values:
                          clickhouse-keeper
                    Topology Key:  kubernetes.io/hostname
                  Weight:          50
            Containers:
              Image:              clickhouse/clickhouse-keeper:23.12.4.15-alpine
              Image Pull Policy:  IfNotPresent
              Name:               clickhouse-keeper
              Resources:
                Limits:
                  Cpu:     2
                  Memory:  2Gi
                Requests:
                  Cpu:     1
                  Memory:  1Gi
          Zone:
        Volume Claim Templates:
          Metadata:
            Creation Timestamp:  <nil>
          Name:                  log-storage-path
          Spec:
            Access Modes:
              ReadWriteOnce
            Resources:
              Requests:
                Storage:  10Gi
          Metadata:
            Creation Timestamp:  <nil>
          Name:                  snapshot-storage-path
          Spec:
            Access Modes:
              ReadWriteOnce
            Resources:
              Requests:
                Storage:  10Gi
  Replicas:               1
  Status:                 In progress
Events:                   <none>

And the reason about status: In progress, I guess because the pod is always CrashLoopBackOff and restart.

Screenshot 2024-02-13 at 9 53 05 PM
hueiyuan commented 8 months ago

@Slach Do you have any idea about it?

Slach commented 8 months ago

Unfortunatelly not

i used simplified manifest

apiVersion: clickhouse-keeper.altinity.com/v1
kind: ClickHouseKeeperInstallation
metadata:
  name: clickhouse-keeper
spec:
  configuration:
    clusters:
    - layout:
        replicasCount: 1
      name: keeper-3
    settings:
      keeper_server/coordination_settings/raft_logs_level: information
      keeper_server/four_letter_word_white_list: '*'
      keeper_server/raft_configuration/server/port: "9444"
      keeper_server/storage_path: /var/lib/clickhouse-keeper
      keeper_server/tcp_port: "2181"
      listen_host: 0.0.0.0
      logger/console: "true"
      logger/level: trace
      prometheus/asynchronous_metrics: "true"
      prometheus/endpoint: /metrics
      prometheus/events: "true"
      prometheus/metrics: "true"
      prometheus/port: "7000"
      prometheus/status_info: "false"
  templates:
    podTemplates:
    - metadata:
        annotations:
          prometheus.io/scrape: "true"
        labels:
          app: clickhouse-keeper
          what: node
      name: clickhouse-keeper-pod
      spec:
        containers:
        - image: clickhouse/clickhouse-keeper:23.12.4.15-alpine
          imagePullPolicy: IfNotPresent
          name: clickhouse-keeper
    volumeClaimTemplates:
    - name: log-storage-path
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
    - name: snapshot-storage-path
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

and can't reproduce it on my side

but found podTemplate is not applied

echozio commented 8 months ago

In my case this happened because the volume was owned by root:root. Setting the fsGroup to 101 solved it for me:

apiVersion: clickhouse-keeper.altinity.com/v1
kind: ClickHouseKeeperInstallation
metadata:
  name: keeper
spec:
  configuration:
    clusters:
      - name: cluster
        layout:
          replicasCount: 3
  templates:
    podTemplates:
      - spec:
          securityContext:
            fsGroup: 101
    volumeClaimTemplates:
      - name: both-paths
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
hueiyuan commented 8 months ago

@echozio @Slach Thanks for your reply and assistant. Indeed setting the fsGroup to 101 which can solve it. Maybe can add related description in README.md