Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse® clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.94k stars 464 forks source link

ClickHouse Keeper in RO mode due to incorrect permissions on snapshots directory #1524

Open linux-wizard opened 1 month ago

linux-wizard commented 1 month ago

I deployed ClickHouse keeper using clickhouse-operator 0.24.0 with 3 nodes and a PVC. Unfortunately ClickHouse Keeper is in Read-Only mode because it failed to write to the snapshot directory /var/lib/clickhouse-keeper/coordination/logs/ as they have incorrect permissions.

Below is error message:

2024.10.07 16:52:07.388939 [ 1 ] {} <Error> void DB::Changelog::readChangelogAndInitWriter(uint64_t, uint64_t): Code: 76. DB::ErrnoException: Cannot open file /var/lib/clickhouse-keeper/coordination/logs/changelog_1_100000.bin: , errno: 13, strerror: Permission denied. (CANNOT_OPEN_FILE), Stack trace (when copying this message, always include the lines below):

I can deploy a working ClickHouse Keeper when not using PVC using clickhouse-operator 0.23.7

# ---
# # Fake Service to drop-in replacement Zookeeper with CHK
# apiVersion: v1
# kind: Service
# metadata:
#   # DNS would be like zookeeper.namespace.svc
#   name: zookeeper
#   labels:
#     app: zookeeper
# spec:
#   ports:
#     - port: 2181
#       name: client
#     - port: 7000
#       name: prometheus
#   selector:
#     app: clickhouse-keeper
#     what: node
---
apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: xxxxxxx
  labels:
    app: clickhouse-keeper
spec:
  configuration:
    clusters:
      - name: "chk-3"
        layout:
          replicasCount: 3
    settings:
      logger/level: "trace"
      logger/console: "true"
      listen_host: "0.0.0.0"
      keeper_server/storage_path: /var/lib/clickhouse-keeper
      keeper_server/tcp_port: "2181"
      keeper_server/four_letter_word_white_list: "*"
      keeper_server/coordination_settings/raft_logs_level: "information"
      keeper_server/raft_configuration/server/port: "9444"
      prometheus/endpoint: "/metrics"
      prometheus/port: "7000"
      prometheus/metrics: "true"
      prometheus/events: "true"
      prometheus/asynchronous_metrics: "true"
      prometheus/status_info: "false"

  defaults:
    templates:
      # Templates are specified as default for all clusters
      podTemplate: default

  templates:
      podTemplates:
        - name: default
          spec:
            # affinity removed to allow use in single node test environment
            affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                        - key: "app"
                          operator: In
                          values:
                            - clickhouse-keeper
                    topologyKey: "kubernetes.io/hostname"
            containers:
              - name: clickhouse-keeper
                imagePullPolicy: IfNotPresent
                image: "clickhouse/clickhouse-keeper:24-alpine"
                resources:
                  requests:
                    memory: "256M"
                    cpu: "1"
                  limits:
                    memory: "4Gi"
                    cpu: "2"
      # volumeClaimTemplates:
      #   - name: both-paths
      #     spec:
      #       storageClassName: gp3-retain
      #       accessModes:
      #         - ReadWriteOnce
      #       resources:
      #         requests:
      #           storage: 10Gi

It seems that by default /var/lib/clickhouse-keeper/coordination/{logs,snapshots} are ownded by root, but we need to ensure that everyone has write access. Below are permissions when not using PVC

chk-edp-global-finance-1:/# ls -ltrh /var/lib/clickhouse-keeper/
total 8K     
drwxr-xr-x    4 root     root          35 Oct  9 11:32 coordination
-rw-r-----    1 clickhou clickhou      36 Oct  9 11:32 uuid
drwxr-x---    2 clickhou clickhou       6 Oct  9 11:32 rocksdb
-rw-r-----    1 clickhou clickhou      23 Oct  9 11:32 state
drwxr-x---    2 clickhou clickhou      31 Oct  9 11:32 preprocessed_configs
chk-edp-global-finance-1:/# ls -ltrh /var/lib/clickhouse-keeper/coordination/
total 0      
drwxrwxrwx    2 root     root          38 Oct  9 11:32 snapshots
drwxrwxrwx    2 root     root          41 Oct  9 11:32 logs

However I do believe it will be better to have these directories owned by root:clickhouse with rwxrwx--- permissions (770)

alex-zaitsev commented 1 month ago

Would it help if you add securityContext as described here? https://github.com/Altinity/clickhouse-operator/issues/1370

Note, that CHK is not compatible between 0.23.7 and 0.24.0 -- see migration guide: https://github.com/Altinity/clickhouse-operator/blob/0.24.0/docs/keeper_migration_from_23_to_24.md

chengjoey commented 1 month ago

Would it help if you add securityContext as described here? #1370

+1, this should be helpful

spec:
  securityContext:
    fsGroup: 101
    fsGroupChangePolicy: OnRootMismatch
    runAsGroup: 101
    runAsUser: 101
alex-zaitsev commented 1 month ago

@chengjoey , we are hesitant to ingest it in the code by default. But maybe it is a good thing to do

jaitaiwan commented 2 weeks ago

Imo it should really be added by default if that's the permissions etc the container requires to be run. I can't think of any reason that this would be disadvantageous?