Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.26k stars 225 forks source link

Unknown table expression identifier ‘system.backup_list’ #956

Closed sanjeev3d closed 3 months ago

sanjeev3d commented 3 months ago

I have been using ClickHouse for many days without any issues. Recently, I added an additional container to the pod specifically for the backup utility. Now I am getting the following error:

Code: 60. DB::Exception: Received from chi-cliff-cliffcluster-replica0-shard0.click.svc.cluster.local:9000. DB::Exception: Unknown table expression identifier 'system.backup_list' in scope SELECT name FROM system.backup_list WHERE (location = 'remote') AND (name LIKE '%chi-cliff-cliffcluster-replica0-shard0.click.svc.cluster.local%') AND (name LIKE '%full%') AND (desc NOT LIKE 'broken%') ORDER BY created DESC LIMIT 1.

Steps to Reproduce:

1.  Deployed ClickHouse and used it without the backup utility for many days.
2.  Added a new container to the pod specifically for the backup utility.
3.  Configured the backup as per the [Altinity ClickHouse Backup Examples](https://github.com/Altinity/clickhouse-backup/blob/master/Examples.md).
4.  Attempted to run the backup utility.

Expected Behavior:

The backup utility should identify and use the system.backup_list table as expected without throwing an unknown table expression identifier error.

Observed Behavior:

An error indicating an unknown table expression identifier for system.backup_list is encountered.

References:

Altinity ClickHouse Backup Examples

sanjeev3d commented 3 months ago

Even I have used these Env variable which related to system.backup_list

apiVersion: v1
kind: ConfigMap
metadata:
  name: clickhouse-backup-config
data:
  LOG_LEVEL: "debug"
  ALLOW_EMPTY_BACKUPS: "true"
  API_LISTEN: "0.0.0.0:7171"
  API_CREATE_INTEGRATION_TABLES: "true"
  BACKUPS_TO_KEEP_REMOTE: "3"
  REMOTE_STORAGE: "s3"
  S3_ACL: "private"
  S3_ENDPOINT: "http://xxxxxxx:xx"
  S3_BUCKET: "clickhouse"
  S3_PATH: "backup/shard-{shard}"
  S3_ACCESS_KEY: "minioadmin"
  S3_SECRET_KEY: "minioadmin"
  S3_FORCE_PATH_STYLE: "true"
  S3_DISABLE_SSL: "true"
  S3_DEBUG: "true"
Slach commented 3 months ago

could you share result for following command kubectl get chi --all-namespaces ?

sanjeev3d commented 3 months ago

@Slach I'm using click Namespace

kubectl get chi --all-namespaces

NAMESPACE NAME CLUSTERS HOSTS STATUS click-zoo clickhouse-poc 1 1 Completed click-zoo cliff 1 9 Completed click cliff 1 9 Completed zoo cliff zoo zoo

Slach commented 3 months ago

could you share kubectl get chi -n click cliff -o yaml without sensitive credentials?

Slach commented 3 months ago

Moreover, could you share results of the following command:

kubectl get pods --all-namespaces -l app=clickhouse-operator -o jsonpath="{.items[*].spec.containers[*].image}"

sanjeev3d commented 3 months ago

Sharing output of kubectl get chi -n click cliff -o yaml

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  annotations:
  finalizers:
  - finalizer.clickhouseinstallation.altinity.com
  generation: 3
    manager: clickhouse-operator
    operation: Update
    time: "2024-07-22T08:06:16Z"
  name: cliff
  namespace: click
spec:
  configuration:
    clusters:
    - layout:
        shards:
        - name: shard0
          replicas:
          - name: replica0-shard0
          - name: replica1-shard0
          - name: replica2-shard0
            templates:
              podTemplate: pod-template-with-volumes-replica
          replicasCount: 3
          templates:
            podTemplate: pod-template-with-volumes-shard
        - name: shard1
          replicas:
          - name: replica0-shard1
          - name: replica1-shard1
          - name: replica2-shard1
            templates:
              podTemplate: pod-template-with-volumes-replica
          replicasCount: 3
          templates:
            podTemplate: pod-template-with-volumes-shard
        - name: shard2
          replicas:
          - name: replica0-shard2
          - name: replica1-shard2
          - name: replica2-shard2
            templates:
              podTemplate: pod-template-with-volumes-replica
          replicasCount: 3
          templates:
            podTemplate: pod-template-with-volumes-shard
      name: cliffcluster
    settings:
      disable_internal_dns_cache: 1
      remote_servers/all-replicated/secret: default
      remote_servers/all-sharded/secret: default
      remote_servers/cliffcluster/secret: default
    users:
      admin/access_management: 1
      admin/networks/ip:
      - 0.0.0.0/0
      - ::/0
      admin/password: xxxxxx
      default/networks/ip:
      - 0.0.0.0/0
      - ::/0
    zookeeper:
      nodes:
      - host: zookeeper-0.zookeepers.click
        port: 2181
      - host: zookeeper-1.zookeepers.click
        port: 2181
      - host: zookeeper-2.zookeepers.click
        port: 2181
  defaults:
    templates:
      podTemplate: pod-template-with-volumes-shard
      serviceTemplate: chi-service-template
  templates:
    podTemplates:
    - name: pod-template-with-volumes-shard
      spec:
        containers:
        - image: clickhouse-server:24.4.2-alpine
          name: clickhouse
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-storage-template-1
        - command:
          - bash
          - -xc
          - /bin/clickhouse-backup server
          envFrom:
          - configMapRef:
              name: clickhouse-backup-config
          image: clickhouse-backup:master
          imagePullPolicy: Always
          name: clickhouse-backup
          ports:
          - containerPort: 7171
            name: backup-rest
          resources:
            limits:
              cpu: "2"
              memory: 4Gi
            requests:
              cpu: "2"
              memory: 4Gi
    - name: pod-template-with-volumes-replica
      spec:
        containers:
        - image: clickhouse-server:24.4.2-alpine
          name: clickhouse
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-storage-template
    serviceTemplates:
    - generateName: clickhouse-{chi}
      name: chi-service-template
      spec:
        ports:
        - name: http
          port: 8123
          targetPort: 8123
        - name: tcp
          port: 9000
          targetPort: 9000
        - name: interserver
          port: 9009
          targetPort: 9009
        type: NodePort
    volumeClaimTemplates:
    - name: clickhouse-storage-template
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: robin-encrypt
    - name: clickhouse-storage-template-1
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: robin-encrypt
... skip ...        

  updated: 9
  version: 0.18.0
sanjeev3d commented 3 months ago

kubectl get pods --all-namespaces -l app=clickhouse-operator -o jsonpath="{.items[].spec.containers[].image}"

altinity/clickhouse-operator:0.18.0 altinity/metrics-exporter:0.18.0 altinity/clickhouse-operator:0.18.0 altinity/metrics-exporter:0.18.0

Slach commented 3 months ago

ok. i see root cause

you defined 3 replicas in each shard and separately defines in 3rd replica in each shard

    - name: replica2-shardX
            templates:
              podTemplate: pod-template-with-volumes-replica 

but pod-template-with-volumes-replica doesn't contains spec.containers[] with backup

remove

         templates:
              podTemplate: pod-template-with-volumes-replica

from all 3 shards

and remove

    - name: pod-template-with-volumes-replica
      spec:
        containers:
        - image: clickhouse-server:24.4.2-alpine
          name: clickhouse
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-storage-template

and remove

    - name: clickhouse-storage-template-1
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: robin-encrypt

and replace clickhouse-storage-template-1 to clickhouse-storage-template

after that all 3 replicas in each shard should contains system.backup_list and system.backup_actions command

sanjeev3d commented 3 months ago

@Slach Still Facing same issue even after removing another pod and volume template as mentioned

Sharing again cluster details from output after applying changes

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  annotations:
  creationTimestamp: "2024-04-29T11:47:47Z"
  finalizers:
  - finalizer.clickhouseinstallation.altinity.com
  generation: 4
  name: cliff
  namespace: click
  resourceVersion: "830274872"
  uid: b41d47c9-5427-4d60-9ce9-88faebfcb184
spec:
  configuration:
    clusters:
    - layout:
        shards:
        - name: shard0
          replicas:
          - name: replica0-shard0
          - name: replica1-shard0
          - name: replica2-shard0
          replicasCount: 3
          templates:
            podTemplate: pod-template-with-volumes-shard
        - name: shard1
          replicas:
          - name: replica0-shard1
          - name: replica1-shard1
          - name: replica2-shard1
          replicasCount: 3
          templates:
            podTemplate: pod-template-with-volumes-shard
        - name: shard2
          replicas:
          - name: replica0-shard2
          - name: replica1-shard2
          - name: replica2-shard2
          replicasCount: 3
          templates:
            podTemplate: pod-template-with-volumes-shard
      name: cliffcluster
    settings:
      disable_internal_dns_cache: 1
      remote_servers/all-replicated/secret: default
      remote_servers/all-sharded/secret: default
      remote_servers/cliffcluster/secret: default
    users:
      admin/access_management: 1
      admin/networks/ip:
      - 0.0.0.0/0
      - ::/0
      admin/password: xxxxxx
      default/networks/ip:
      - 0.0.0.0/0
      - ::/0
    zookeeper:
      nodes:
      - host: zookeeper-0.zookeepers.click
        port: 2181
      - host: zookeeper-1.zookeepers.click
        port: 2181
      - host: zookeeper-2.zookeepers.click
        port: 2181
  defaults:
    templates:
      podTemplate: pod-template-with-volumes-shard
      serviceTemplate: chi-service-template
  templates:
    podTemplates:
    - name: pod-template-with-volumes-shard
      spec:
        containers:
        - image: clickhouse-server:24.4.2-alpine
          name: clickhouse
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-storage-template
        - command:
          - bash
          - -xc
          - /bin/clickhouse-backup server
          envFrom:
          - configMapRef:
              name: clickhouse-backup-config
          image: clickhouse-backup:master
          imagePullPolicy: Always
          name: clickhouse-backup
          ports:
          - containerPort: 7171
            name: backup-rest
          resources:
            limits:
              cpu: "2"
              memory: 4Gi
            requests:
              cpu: "2"
              memory: 4Gi
    serviceTemplates:
    - generateName: clickhouse-{chi}
      name: chi-service-template
      spec:
        ports:
        - name: http
          port: 8123
          targetPort: 8123
        - name: tcp
          port: 9000
          targetPort: 9000
        - name: interserver
          port: 9009
          targetPort: 9009
        type: NodePort
    volumeClaimTemplates:
    - name: clickhouse-storage-template
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: robin-encrypt
Slach commented 3 months ago

share logs

kubectl logs -n click chi-cliff-cliffcluster-replica0-shard0 --container=clickhouse-backup --since=48h

sanjeev3d commented 3 months ago

share logs

kubectl logs -n click chi-cliff-cliffcluster-replica0-shard0 --container=clickhouse-backup --since=48h

kubectl logs chi-cliff-cliffcluster-replica0-shard0-0 --container=clickhouse-backup --since=48h -n click

+ /bin/clickhouse-backup server
2024/07/22 13:58:51.873895  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/07/22 13:58:51.875379  warn clickhouse connection ping: tcp://localhost:9000 return error: dial tcp [::1]:9000: connect: connection refused, will wait 5 second to reconnect logger=clickhouse
2024/07/22 13:58:56.879248  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/07/22 13:58:56.883755  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/07/22 13:58:56.883899  info Create integration tables logger=server
2024/07/22 13:58:56.883954  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/07/22 13:58:56.885679  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/07/22 13:58:56.885769  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/07/22 13:58:56.891874  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/07/22 13:58:56.915501  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/07/22 13:58:56.931456  info SELECT engine FROM system.databases WHERE name = 'system' logger=clickhouse
2024/07/22 13:58:56.935621  info clickhouse connection closed logger=clickhouse
2024/07/22 13:58:56.935677 error open /var/lib/clickhouse/flags/force_drop_table: no such file or directory logger=server.Run
2024/07/22 13:58:56.936335  info Starting API server 9121cd4192cfa2e8c84a1fc21822ab8c8e660f8a on 0.0.0.0:7171 logger=server.Run
2024/07/22 13:58:56.939474  info Update backup metrics start (onlyLocal=false) logger=server
2024/07/22 13:58:56.939535  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/07/22 13:58:56.939602  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/07/22 13:58:56.941886  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/07/22 13:58:56.941961  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/07/22 13:58:56.942362  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/07/22 13:58:56.942410  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/07/22 13:58:56.946445  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/07/22 13:58:56.948150  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/07/22 13:58:56.966350  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/07/22 13:58:56.969567  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/07/22 13:58:56.981439 error ResumeOperationsAfterRestart return error: open /var/lib/clickhouse/backup: no such file or directory logger=server.Run
2024/07/22 13:58:56.981911  info clickhouse connection closed logger=clickhouse
2024/07/22 13:58:56.981997  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/07/22 13:58:56.984384  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/07/22 13:58:56.984453  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/07/22 13:58:56.994293  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/07/22 13:58:56.997715  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/07/22 13:58:57.008841  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/07/22 13:58:57.017919  info [s3:DEBUG] Request
GET /clickhouse?versioning= HTTP/1.1
Host: [CLUSTER_VIP]:32621
User-Agent: aws-sdk-go-v2/1.26.1 os/linux lang/go#1.22.3 md/GOOS#linux md/GOARCH#amd64 api/s3#1.53.1
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 0434c027-694d-41a0-b177-ee6b055a89f7
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=minioadmin/20240722/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=1d2c05e97229eb08bcb0c68e97b03397c754072fe41e6b58913e8b7594cfca35
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240722T135857Z

2024/07/22 13:58:57.021584  info [s3:DEBUG] Response
HTTP/1.1 200 OK
Content-Length: 99
Accept-Ranges: bytes
Content-Type: application/xml
Date: Mon, 22 Jul 2024 13:58:57 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
X-Amz-Request-Id: 17E48DAE3B4328B9
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block

2024/07/22 13:58:57.021947 debug /tmp/.clickhouse-backup-metadata.cache.S3 not found, load 0 elements logger=s3
2024/07/22 13:58:57.023029  info [s3:DEBUG] Request
GET /clickhouse?delimiter=%2F&list-type=2&max-keys=1000&prefix=backup%2Fshard-shard0%2F HTTP/1.1
Host: [CLUSTER-VIP]:32621
User-Agent: aws-sdk-go-v2/1.26.1 os/linux lang/go#1.22.3 md/GOOS#linux md/GOARCH#amd64 api/s3#1.53.1
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: b21de8d0-8a70-4c38-9472-c82854220b2e
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=minioadmin/20240722/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=7424dcb80f140232bfc6ac3b2beaa756d96b538add420b80c15f47ef82607eac
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240722T135857Z

2024/07/22 13:58:57.025337  info [s3:DEBUG] Response
HTTP/1.1 200 OK
Content-Length: 285
Accept-Ranges: bytes
Content-Type: application/xml
Date: Mon, 22 Jul 2024 13:58:57 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
X-Amz-Request-Id: 17E48DAE3B744C98
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block

2024/07/22 13:58:57.026245 debug /tmp/.clickhouse-backup-metadata.cache.S3 save 0 elements logger=s3
2024/07/22 13:58:57.026420  info clickhouse connection closed logger=clickhouse
2024/07/22 13:58:57.026467  info Update backup metrics finish LastBackupCreateLocal=<nil> LastBackupCreateRemote=<nil> LastBackupSizeLocal=0 LastBackupSizeRemote=0 LastBackupUpload=<nil> NumberBackupsLocal=0 NumberBackupsRemote=0 duration=87ms logger=server
Slach commented 3 months ago

you miss mount

    volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-storage-template

inside clickhouse-backup container mounted only in clickhouse container

change spec.containers section


 templates:
    podTemplates:
    - name: pod-template-with-volumes-shard
      spec:
        containers:
        - image: clickhouse-server:24.4.2-alpine
          name: clickhouse
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-storage-template
        - name: clickhouse-backup
          command:
          - bash
          - -xc
          - /bin/clickhouse-backup server
          envFrom:
          - configMapRef:
              name: clickhouse-backup-config
          image: clickhouse-backup:stable
          imagePullPolicy: Always
          volumeMounts:
          - mountPath: /var/lib/clickhouse
            name: clickhouse-storage-template
          ports:
          - containerPort: 7171
            name: backup-rest
          resources:
            limits:
              cpu: "2"
              memory: 4Gi
            requests:
              cpu: "2"
              memory: 4Gi
Slach commented 3 months ago

and replace altinity/clickhouse-backup:master to altinity/clickhouse-backup:stable