Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.29k stars 226 forks source link

use_environment_credentials is not working when using IRSA #1025

Closed jasondavindev closed 1 month ago

jasondavindev commented 1 month ago

ClickHouse server version 24.1.2.5 ClickHouse backup version: 2.6.2

In my clickhouse setup I set use_environment_credentials to true for s3 disk, but when using remote backup it cannot uses service account credentials

  <storage_configuration>
      <disks>
        <s3_backup>
          <type>s3</type>
          <endpoint>https://xxxxxxxxxxx.s3.amazonaws.com/</endpoint>
          <use_environment_credentials>true</use_environment_credentials>
        </s3_backup>

      <!--
        default disk is special, it always exists even if not explicitly configured here,
        but you can't change it's path here (you should use <path> on top level config instead)
      -->
      <default>
        <!--
          You can reserve some amount of free space on any disk (including default) by adding
          keep_free_space_bytes tag.
        -->
        <keep_free_space_bytes>10485760</keep_free_space_bytes>
      </default>
      <s3>
        <type>s3</type>
        <endpoint>https://xxxxxxxxxxxx.s3.amazonaws.com/data2/</endpoint>
        <use_environment_credentials>true</use_environment_credentials>
      </s3>
    </disks>

The following warn is shown

2024-10-10 21:40:38.196 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3_backup doesn't contains <access_key_id> and <secret_access_key> environment variables will use
2024-10-10 21:40:38.200 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3 doesn't contains <access_key_id> and <secret_access_key> environment variables will use

And following error is shown

2024-10-10 21:40:55.236 FTL cmd/clickhouse-backup/main.go:658 > error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-10-full/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/vkw/nyfgwkxlxfhshaxogyccexradjzrf -> xxxxxxxxxxx/s3/2024-10-10-full/s3/vkw/nyfgwkxlxfhshaxogyccexradjzrf return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: DQCDY8K4EBJDPEN6, HostID: t/U9Ut73DraD/sbHxG6xLKitulhU867kZV8TQOxJ4tvWhI7CmlUv62nzRdKVfi9vafyt9p+v4Rs=, api error AccessDenied: Access Denied"

My config

general:
  remote_storage: s3
  max_file_size: 1073741824
  backups_to_keep_local: -1

  backups_to_keep_remote: 0

  log_level: info
  allow_empty_backups: false

  download_concurrency: 8
  upload_concurrency: 8

  download_max_bytes_per_second: 0
  upload_max_bytes_per_second: 0

  object_disk_server_side_copy_concurrency: 32

  allow_object_disk_streaming: false

  restore_schema_on_cluster: "cluster"
  upload_by_part: true
  download_by_part: true
  use_resumable_state: true

  restore_database_mapping: {}

  restore_table_mapping: {}

  retries_on_failure: 3
  retries_pause: 5s

  watch_interval: 1h
  full_interval: 24h
  watch_backup_name_template: "shard{shard}-{type}-{time:20060102150405}"

  sharded_operation_mode: none

  cpu_nice_priority: 15
  io_nice_priority: "idle"

  rbac_backup_always: true
  rbac_resolve_conflicts: "recreate"
clickhouse:
  username: default
  password: ""
  host: localhost
  port: 9000
  skip_tables:
    - system.*
    - INFORMATION_SCHEMA.*
    - information_schema.*
    - default.*

  timeout: 6h
  freeze_by_part: false
  freeze_by_part_where: ""
  secure: false
  skip_verify: true
  sync_replicated_tables: true
  log_sql_queries: false
  debug: false
  config_dir: "/etc/clickhouse-server"
  ignore_not_exists_error_during_freeze: true
  check_replicas_before_attach: true
  use_embedded_backup_restore: false
  embedded_backup_disk: ""
  backup_mutations: true
  restore_as_attach: true
  check_parts_columns: true
  max_connections: 0
s3:
  bucket: "xxxxxxxxxxxxx"
  endpoint: ""
  region: us-east-1

  acl: private
  assume_role_arn: ""
  force_path_style: false
  path: ""
  object_disk_path: "backups/"
  disable_ssl: false
  compression_level: 1
  compression_format: tar

  disable_cert_verification: true
  use_custom_storage_class: false
  storage_class: STANDARD
  concurrency: 1
  part_size: 0
  max_parts_count: 10000
  allow_multipart_download: false
  checksum_algorithm: ""

For test purposes I selected just 1 table for backup and it works

But when I selected a tables set the AccessDenied is shown

Output of successfully backup (wrote to s3)

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml list
backup5      64.95GiB   10/10/2024 18:01:35   remote      tar, regular
2024-10-10   10.00GiB   10/10/2024 21:15:30   remote      tar, regular

The following selected tables that backup does not working

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml tables
signoz_logs.logs                                              173.72GiB   default,s3  full
signoz_traces.signoz_index_v2                                 160.41GiB   default,s3  full
signoz_logs.logs_v2                                           65.64GiB    default     full
signoz_traces.durationSort                                    52.24GiB    default,s3  full
signoz_traces.signoz_spans                                    21.50GiB    default,s3  full
signoz_metrics.samples_v2                                     11.96GiB    default     full
signoz_metrics.samples_v4                                     10.01GiB    default,s3  full
signoz_metrics.samples_v4_agg_5m                              4.71GiB     default     full
signoz_metrics.samples_v4_agg_30m                             1.34GiB     default     full
signoz_metrics.time_series_v4                                 1.22GiB     default,s3  full
signoz_metrics.time_series_v4_6hrs                            1017.92MiB  default,s3  full
signoz_metrics.time_series_v4_1day                            889.61MiB   s3,default  full
signoz_metrics.time_series_v2                                 873.61MiB   default     full
signoz_metrics.time_series_v4_1week                           832.37MiB   default     full
signoz_traces.span_attributes                                 775.90MiB   default     full
signoz_logs.tag_attributes                                    692.19MiB   default     full
signoz_traces.dependency_graph_minutes_v2                     225.38MiB   s3,default  full
signoz_traces.dependency_graph_minutes                        140.90MiB   default     full
signoz_traces.signoz_error_index_v2                           113.24MiB   default,s3  full
signoz_logs.logs_v2_resource                                  8.15MiB     default     full
signoz_logs.distributed_logs                                  1.18MiB     default     full
signoz_logs.distributed_logs_v2                               691.83KiB   default     full
signoz_metrics.distributed_samples_v4                         526.60KiB   default     full
signoz_logs.distributed_tag_attributes                        264.70KiB   default     full
signoz_metrics.distributed_samples_v2                         257.77KiB   default     full
signoz_analytics.rule_state_history                           56.81KiB    default     full
signoz_traces.usage_explorer                                  55.57KiB    default,s3  full
signoz_logs.distributed_logs_v2_resource                      42.04KiB    default     full
signoz_metrics.distributed_time_series_v4                     17.99KiB    default     full
signoz_metrics.usage                                          12.06KiB    default     full
signoz_logs.usage                                             10.00KiB    default     full
signoz_traces.usage                                           9.23KiB     default     full
signoz_traces.top_level_operations                            7.23KiB     default     full
signoz_metrics.distributed_time_series_v2                     5.51KiB     default     full
signoz_traces.span_attributes_keys                            5.37KiB     default     full
signoz_logs.logs_resource_keys                                1.08KiB     default     full
signoz_traces.schema_migrations                               1.00KiB     default     full
signoz_logs.schema_migrations                                 719B        default     full
signoz_logs.logs_attribute_keys                               708B        default     full
signoz_metrics.schema_migrations                              598B        default     full
signoz_logs.resource_keys_string_final_mv                     0B          default     full
signoz_metrics.distributed_samples_v4_agg_30m                 0B          default     full
signoz_metrics.distributed_samples_v4_agg_5m                  0B          default     full
signoz_logs.distributed_usage                                 0B          default     full
signoz_metrics.distributed_time_series_v3                     0B          default     full
signoz_logs.distributed_logs_resource_keys                    0B          default     full
signoz_metrics.distributed_time_series_v4_1day                0B          default     full
signoz_metrics.distributed_time_series_v4_1week               0B          default     full
signoz_metrics.distributed_time_series_v4_6hrs                0B          default     full
signoz_metrics.distributed_usage                              0B          default     full
signoz_metrics.exp_hist                                       0B          default     full
signoz_logs.distributed_logs_attribute_keys                   0B          default     full
signoz_logs.attribute_keys_string_final_mv                    0B          default     full
signoz_logs.attribute_keys_float64_final_mv                   0B          default     full
signoz_metrics.samples_v4_agg_30m_mv                          0B          default     full
signoz_logs.attribute_keys_bool_final_mv                      0B          default     full
signoz_metrics.samples_v4_agg_5m_mv                           0B          default     full
signoz_analytics.distributed_rule_state_history               0B          default     full
signoz_metrics.time_series_v3                                 0B          default     full
signoz_metrics.time_series_v4_1day_mv                         0B          s3,default  full
signoz_metrics.time_series_v4_1week_mv                        0B          default     full
signoz_metrics.time_series_v4_6hrs_mv                         0B          s3,default  full
signoz_traces.dependency_graph_minutes_db_calls_mv            0B          default     full
signoz_traces.dependency_graph_minutes_db_calls_mv_v2         0B          default,s3  full
signoz_traces.dependency_graph_minutes_messaging_calls_mv     0B          default     full
signoz_traces.dependency_graph_minutes_messaging_calls_mv_v2  0B          default,s3  full
signoz_traces.dependency_graph_minutes_service_calls_mv       0B          default     full
signoz_traces.dependency_graph_minutes_service_calls_mv_v2    0B          s3,default  full
signoz_traces.distributed_dependency_graph_minutes            0B          default     full
signoz_traces.distributed_dependency_graph_minutes_v2         0B          default     full
signoz_traces.distributed_durationSort                        0B          default     full
signoz_traces.distributed_signoz_error_index_v2               0B          default     full
signoz_traces.distributed_signoz_index_v2                     0B          default     full
signoz_traces.distributed_signoz_spans                        0B          default     full
signoz_traces.distributed_span_attributes                     0B          default     full
signoz_traces.distributed_span_attributes_keys                0B          default     full
signoz_traces.distributed_top_level_operations                0B          default     full
signoz_traces.distributed_usage                               0B          default     full
signoz_traces.distributed_usage_explorer                      0B          default     full
signoz_traces.durationSortMV                                  0B          default,s3  full
signoz_traces.root_operations                                 0B          default     full
signoz_traces.signoz_error_index                              0B          default     full
signoz_traces.signoz_index                                    0B          default     full
signoz_traces.sub_root_operations                             0B          default     full
signoz_traces.usage_explorer_mv                               0B          default,s3  full
signoz_metrics.distributed_exp_hist                           0B          default     full

When I selected just signoz_metrics.samples_v4 that containts data on local disk and remote (s3) the backup was successfully.

Note: my IAM role has full access on s3

jasondavindev commented 1 month ago

Another test

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml tables
signoz_metrics.samples_v2  11.97GiB  default     full
signoz_metrics.samples_v4  10.01GiB  default,s3  full
chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml create_remote partial
2024-10-10 21:55:48.653 INF pkg/backup/create.go:170 > done createBackupRBAC size=0B
2024-10-10 21:55:48.925 WRN pkg/backup/backuper.go:118 > MAX_FILE_SIZE=1073741824 is less than actual 17035327904, please remove general->max_file_size section from your config
2024-10-10 21:55:49.845 INF pkg/backup/create.go:324 > done progress=9/215 table=signoz_metrics.samples_v2
2024-10-10 21:55:50.179 INF pkg/backup/create.go:324 > done progress=10/215 table=signoz_metrics.samples_v4
2024-10-10 21:55:50.197 INF pkg/backup/create.go:336 > done duration=2.128s operation=createBackupLocal version=2.6.2
2024-10-10 21:57:27.083 INF pkg/backup/upload.go:171 > done duration=1m36.326s operation=upload_data progress=2/2 size=10.01GiB table=signoz_metrics.samples_v4 version=2.6.2
2024-10-10 21:57:36.590 INF pkg/backup/upload.go:171 > done duration=1m45.832s operation=upload_data progress=1/2 size=11.97GiB table=signoz_metrics.samples_v2 version=2.6.2
2024-10-10 21:57:36.632 INF pkg/backup/upload.go:240 > done backup=partial duration=1m46.434s object_disk_size=0B operation=upload upload_size=21.98GiB version=2.6.2
2024-10-10 21:57:37.056 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3_backup/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:166 > done backup=partial duration=496ms location=local operation=delete

The previous warning is not shown (the following log is from my previous post)

2024-10-10 21:40:38.196 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3_backup doesn't contains <access_key_id> and <secret_access_key> environment variables will use
2024-10-10 21:40:38.200 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3 doesn't contains <access_key_id> and <secret_access_key> environment variables will use
Slach commented 1 month ago

Thanks for the detailed report

Did you setup AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY inside clickhouse-backup container?

try --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY or --env AWS_ROLE_ARN

Could you share your current pod manifest with replace sensitive credentials to XXX? kubectl -n <your-namespace> chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 -o yaml

When you use IRSA, which serviceAccount do you use? In this case, serviceAccount mounts into pod and some environment variables injected into env section.

Slach commented 1 month ago

path: "" object_disk_path: "backups/"

better to replace it

    path: "backups"
    object_disk_path: "object_disks_backups"
Slach commented 1 month ago

Warning and error will show only if you have data parts in s3 disk

Slach commented 1 month ago

related code fragment https://github.com/Altinity/clickhouse-backup/blob/master/pkg/storage/object_disk/object_disk.go#L354-L367

jasondavindev commented 1 month ago

Did you setup AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY inside clickhouse-backup container?

I running clickhouse-backup bin inside clickhouse-server container. The used service account works for normal clickhouse-server workloads (s3 disk as cold storage) with s3 full access

try --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY or --env AWS_ROLE_ARN

I tried but it didnt work

path: "" object_disk_path: "backups/" better to replace it

I changed path but no changes in s3 structure, like the config was ignored

I using SigNoz helm chart with clickhouse dependency 3 shards and 1 replica per shard

Clickhouse pod generated manifest


apiVersion: v1
kind: Pod
metadata:
  annotations:
    signoz.io/path: /metrics
    signoz.io/port: "9363"
    signoz.io/scrape: "true"
  labels:
    app.kubernetes.io/component: clickhouse
    app.kubernetes.io/instance: signoz-tools-cluster
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: clickhouse
    app.kubernetes.io/version: 24.1.2
    apps.kubernetes.io/pod-index: "0"
    argocd.argoproj.io/instance: signoz-tools-cluster
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: signoz-tools-cluster-clickhouse
    clickhouse.altinity.com/cluster: cluster
    clickhouse.altinity.com/namespace: signoz
    clickhouse.altinity.com/ready: "yes"
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
    helm.sh/chart: clickhouse-24.1.6
    statefulset.kubernetes.io/pod-name: chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
  name: chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
  namespace: signoz
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: chi-signoz-tools-cluster-clickhouse-cluster-0-0
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/component
            operator: In
            values:
            - zookeeper
            - clickhouse
        topologyKey: kubernetes.io/hostname
  containers:
  - command:
    - /bin/bash
    - -c
    - /usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
    env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: us-east-1
    - name: AWS_REGION
      value: us-east-1
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::xxxxxxxxxxx:role/ClickhouseEKSRole
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    image: xxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/docker-hub/clickhouse/clickhouse-server:24.1.2-alpine
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 10
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: clickhouse
    ports:
    - containerPort: 8123
      name: http
      protocol: TCP
    - containerPort: 9000
      name: client
      protocol: TCP
    - containerPort: 9009
      name: interserver
      protocol: TCP
    - containerPort: 9000
      name: tcp
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "4"
        memory: 12Gi
      requests:
        cpu: "3"
        memory: 8Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/clickhouse
      name: data-volumeclaim-template
    - mountPath: /var/lib/clickhouse/user_scripts
      name: shared-binary-volume
    - mountPath: /etc/clickhouse-server/functions
      name: custom-functions-volume
    - mountPath: /etc/clickhouse-server/config.d/
      name: chi-signoz-tools-cluster-clickhouse-common-configd
    - mountPath: /etc/clickhouse-server/users.d/
      name: chi-signoz-tools-cluster-clickhouse-common-usersd
    - mountPath: /etc/clickhouse-server/conf.d/
      name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-hn6tq
      readOnly: true
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
  initContainers:
  - command:
    - sh
    - -c
    - |
      set -x
      wget -O /tmp/histogramQuantile https://github.com/SigNoz/signoz/raw/develop/deploy/docker/clickhouse-setup/user_scripts/histogramQuantile
      mv /tmp/histogramQuantile  /var/lib/clickhouse/user_scripts/histogramQuantile
      chmod +x /var/lib/clickhouse/user_scripts/histogramQuantile
    env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: us-east-1
    - name: AWS_REGION
      value: us-east-1
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::xxxxxxxxxxx:role/ClickhouseEKSRole
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    image: docker.io/alpine:3.18.2
    imagePullPolicy: IfNotPresent
    name: signoz-tools-cluster-clickhouse-udf-init
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/clickhouse/user_scripts
      name: shared-binary-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-hn6tq
      readOnly: true
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
  nodeSelector:
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/provisioner-name: observability-stack-provisioner
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 101
    fsGroupChangePolicy: OnRootMismatch
    runAsGroup: 101
    runAsUser: 101
  serviceAccount: signoz-tools-cluster-clickhouse
  serviceAccountName: signoz-tools-cluster-clickhouse
  subdomain: chi-signoz-tools-cluster-clickhouse-cluster-0-0
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: ObservabilityStackOnly
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token
  - name: data-volumeclaim-template
    persistentVolumeClaim:
      claimName: data-volumeclaim-template-chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
  - emptyDir: {}
    name: shared-binary-volume
  - configMap:
      defaultMode: 420
      name: signoz-tools-cluster-clickhouse-custom-functions
    name: custom-functions-volume
  - configMap:
      defaultMode: 420
      name: chi-signoz-tools-cluster-clickhouse-common-configd
    name: chi-signoz-tools-cluster-clickhouse-common-configd
  - configMap:
      defaultMode: 420
      name: chi-signoz-tools-cluster-clickhouse-common-usersd
    name: chi-signoz-tools-cluster-clickhouse-common-usersd
  - configMap:
      defaultMode: 420
      name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
    name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
  - name: kube-api-access-hn6tq
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace 
jasondavindev commented 1 month ago

I changed /var/lib/clickhouse/preprocessed_configs/config.xml file adding aws credentials and the warning is not shown, but access denied error remains

2024-10-11 14:40:56.684 INF pkg/backup/create.go:170 > done createBackupRBAC size=0B
2024-10-11 14:40:56.735 WRN pkg/backup/backuper.go:118 > MAX_FILE_SIZE=1073741824 is less than actual 17035327904, please remove general->max_file_size section from your config
2024-10-11 14:41:14.253 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied table=signoz_logs.logs
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_index_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.durationSort
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_spans
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4_agg_5m
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4_agg_30m
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_6hrs
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_1day
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.tag_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_1week
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.span_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.dependency_graph_minutes_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.dependency_graph_minutes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_error_index_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_v2_resource
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_v2
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v2
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_tag_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_tag_attributes
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_v2_resource
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_v2_resource
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_span_attributes
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_span_attributes
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_analytics.rule_state_history
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.usage_explorer
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v2
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v2
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.top_level_operations
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.span_attributes_keys
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_attribute_keys
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_resource_keys
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.schema_migrations
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.schema_migrations
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.schema_migrations
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4_agg_30m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4_agg_30m
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4_agg_5m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4_agg_5m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.resource_keys_string_final_mv
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v3
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v3
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_usage
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_1day
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_1day
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_1week
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_1week
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_6hrs
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_6hrs
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.exp_hist
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_resource_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_resource_keys
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_attribute_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_attribute_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_string_final_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.samples_v4_agg_30m_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_float64_final_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.samples_v4_agg_5m_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_bool_final_mv
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_analytics.distributed_rule_state_history
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_analytics.distributed_rule_state_history
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v3
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_1day_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_1week_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_6hrs_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_db_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_db_calls_mv_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_messaging_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_messaging_calls_mv_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_service_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_service_calls_mv_v2
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_dependency_graph_minutes
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_dependency_graph_minutes
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_dependency_graph_minutes_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_dependency_graph_minutes_v2
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_durationSort
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_durationSort
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_error_index_v2
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_error_index_v2
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_index_v2
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_index_v2
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_spans
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_spans
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_span_attributes_keys
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_span_attributes_keys
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_top_level_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_top_level_operations
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_usage
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_usage
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_usage_explorer
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_usage_explorer
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.durationSortMV
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.root_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_error_index
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_index
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.sub_root_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.usage_explorer_mv
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_exp_hist
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_exp_hist
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:139 > backup failed error: one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied
2024-10-11 14:41:14.525 INF pkg/backup/delete.go:185 > cleanBackupObjectDisks deleted 0 keys backup=2024-10-11-remote2 duration=35ms
2024-10-11 14:41:14.525 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/backup/2024-10-11-remote2'
2024-10-11 14:41:14.613 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3_backup/backup/2024-10-11-remote2'
2024-10-11 14:41:14.613 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2'
2024-10-11 14:41:14.618 INF pkg/backup/delete.go:166 > done backup=2024-10-11-remote2 duration=359ms location=local operation=delete
2024-10-11 14:41:14.733 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/shadow
2024-10-11 14:41:14.733 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/disks/s3_backup/shadow
2024-10-11 14:41:14.741 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/disks/s3/shadow
2024-10-11 14:41:14.741 FTL cmd/clickhouse-backup/main.go:658 > error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied"

Note: IAM Role has s3 full access and the AWS credentials is for my aws user (admin access)

jasondavindev commented 1 month ago

See the code block here

Is the srcBucket variable empty?

Compare the output log

error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/rky/guvjhazneieklouevfhijqiaduqlk -> my-bucket/backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: AS1YYQCZF4KJ8QZY, HostID: /8ntBN2alKtBkXTy9YcODvCAnEb/bDf8KbJH1mOL0OlTJwChCkH3bysFHih4k9x+cVqKOST3Pd0=, api error AccessDenied: Access Denied"
S3->CopyObject data2/rky/guvjhazneieklouevfhijqiaduqlk -> my-bucket/backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk
               /\
               ||

The log shown only key but not the source bucket

jasondavindev commented 1 month ago

The s3 logs

2024-10-11 15:16:27.034 INF pkg/storage/s3.go:49 > [s3:DEBUG] Request
GET /?versioning= HTTP/1.1
Host: data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com
User-Agent: m/F aws-sdk-go-v2/1.30.5 os/linux lang/go#1.22.7 md/GOOS#linux md/GOARCH#arm64 api/s3#1.61.2
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 338f4e3a-00cd-4f38-a096-ea36114e0b97
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=**********/20241011/xxxxxxxxxxxxx-cold-storage-tools/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=xxxxxxxxxxx
X-Amz-Content-Sha256: xxxxxxxxxxx
X-Amz-Date: 20241011T151627Z

2024-10-11 15:16:27.051 INF pkg/storage/s3.go:49 > [s3:DEBUG] request failed with unretryable error https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "https://data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com/?versioning=": dial tcp: lookup data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com on 10.205.0.10:53: no such host
2024-10-11 15:16:27.071 INF pkg/storage/s3.go:49 > [s3:DEBUG] Request
PUT /backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk?x-id=CopyObject HTTP/1.1
Host: xxxxxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com
User-Agent: m/F aws-sdk-go-v2/1.30.5 os/linux lang/go#1.22.7 md/GOOS#linux md/GOARCH#arm64 api/s3#1.61.2
Content-Length: 0
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 3736572a-701b-4537-978c-7d8d3b1d54e5
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=**********/20241011/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-security-token;x-amz-storage-class, Signature=xxxxxxxxxxx
X-Amz-Content-Sha256: xxxxxxxxxxx
X-Amz-Copy-Source: data2/rky/guvjhazneieklouevfhijqiaduqlk
X-Amz-Date: 20241011T151627Z
X-Amz-Security-Token: xxxxxxxxxxx
X-Amz-Storage-Class: STANDARD

The bucket key path (data2/) is inside s3 host Host: data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com Is correct??

xxxxxxxxxxxxx-cold-storage-tools is the s3 disk set in config.xml

jasondavindev commented 1 month ago

The s3 endpoint string format was wrong.

I changed https://xxxxxx-storage-test-tools.s3.amazonaws.com/data/ to https://xxxxxx-storage-test-tools.s3.us-east-1.amazonaws.com/data/ and work

Slach commented 1 month ago

Is image: xxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/docker-hub/clickhouse/clickhouse-server:24.1.2-alpine contains clickhouse-backup binary?

jasondavindev commented 1 month ago

No. I installed clickhouse-backup bin manually in the container

Slach commented 1 month ago

Unfortunatelly, https://github.com/SigNoz/charts/blob/main/charts/clickhouse/templates/clickhouse-instance/clickhouse-instance.yaml#L202 doesn't allow run second container with clickhouse-backup

in this case i would like to propose use standard BACKUP and RESTORE sql commands which available with modern clickhouse-server version

look details in https://clickhouse.com/docs/en/operations/backup

you can just create kind: CronJob which will just execute something like clickhouse-client -h chi...-0-0 --user ... --password ... -q "BACKUP ALL ON CLUSTER '{cluster}' TO S3(...)" and for restore kind: Job which will just execute something like clickhouse-client -h chi...-0-0-0 --user ... --password ... -q "BACKUP ALL ON CLUSTER '{cluster}' TO S3(...)"

jasondavindev commented 1 month ago

I think to fork the chart and customize to provide side cars containers to clickhouse-server.

About embbeded backup suggestions, I tried but the backup fails for clustered workload and I use clickhouse-backup for this.

Another option is run a cronjob that connects to clickhouse-server pod through kubectl command and runs backup in it.

Slach commented 1 month ago

which failure do you have with BACKUP ALL ON CLUSTER ?

did you check SELECT * FROM system.backup_log?

jasondavindev commented 1 month ago

When backing up using ON CLUSTER flag I must do many parts synchronization and we do not have deep knowledge about this. We are new clickhouse users and we are learning about it. clickhouse-backup has deep managing features and I prefer it.

Before clickhouse BACKUP/RESTORE features, we used velero. But on recovery steps, we have too many parts and other errors to handle.

Slach commented 1 month ago

I think, too many parts is not related to used backup tool ;) this is usually related to wrong INSERT pattern and insert rows batch size which produces a lot of small data parts

BACKUP ALL .. ON CLUSTER should work very well in clickhouse-server:24.8

on cluster means upload parts to s3 will just spread between replicas inside shard not so much parts sync as you think

jasondavindev commented 1 month ago

Example error

Received exception from server (version 24.1.2):
Code: 647. DB::Exception: Received from localhost:9000. DB::Exception: Got error from chi%2Dsignoz%2Dtools%2Dcluster%2Dclickhouse%2Dcluster%2D2%2D0:9000. DB::Exception: Table signoz_logs.logs_v2 on replica chi-signoz-tools-cluster-clickhouse-cluster-0-0 has part 20240927_1_1_0 different from the part on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0 (checksum '5d2c4cb2a3959b040da2e13c398090fb' on replica chi-signoz-tools-cluster-clickhouse-cluster-0-0 != checksum 'a234ebfd4d43dbb6639eccbb5e286882' on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0). (CANNOT_BACKUP_TABLE)

When we changed from 1 shard to 2 shards, the organic replication was used and no manually steps was did. I dont know if later steps are necessary.

Slach commented 1 month ago

has part 20240927_1_1_0 different from the part on replica When we changed from 1 shard to 2 shards, the organic replication was used

Hm, could you share

SELECT hostName(), engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'

To fix your issue, i would like propse to run kubectl exec chi-signoz-tools-cluster-clickhouse-cluster-0-0 -- clickhouse-client --receive-timeout=86400 -q "OPTIMIZE TABLE signoz_logs.logs_v2 PARTITION 20240927 FINAL"

and try BACKUP again

jasondavindev commented 1 month ago

I ran the OPTIMIZE TABLE command for above partition and for each BACKUP execution I needed run the OPTIMIZE TABLE command for that partition. Finally, I ran for all partitions (no PARTITION arg in OPTIMIZE command), but still mismatch part error is shown.

Curious is that for some OPTIMIZE executions some errors were shown:

Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [chi-signoz-tools-cluster-clickhouse-cluster-2-0:9000]: Code: 53. DB::Exception: Type mismatch in IN or VALUES section. Expected: Date. Got: Float64. (TYPE_MISMATCH) (version 24.1.2.5 (official build)). (TYPE_MISMATCH)
Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: Type mismatch in IN or VALUES section. Expected: Date. Got: Float64. (TYPE_MISMATCH)

Hm, could you share

SELECT hostName(), engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 │ ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-2-0-0 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-1-0-0 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Slach commented 1 month ago

Did you receive errors above when executing BACKUP command or something else? Could you share full stacktrace in this case?

Moreover, les's compare uuid

SELECT hostName(), uuid, engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'
Slach commented 1 month ago

upgrade your clickhouse-server version to 24.8

jasondavindev commented 1 month ago
SELECT
    hostName(),
    uuid,
    engine_full
FROM cluster('all-sharded', system.tables)
WHERE (database = 'signoz_logs') AND (table = 'logs_v2')

┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-2-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-1-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

yes, the above shown error came from BACKUP command

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0.chi-signoz-tools-cluster-clickhouse-cluster-0-0.signoz.svc.cluster.local :) BACKUP ALL ON CLUSTER 'cluster' TO S3('https://xxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com/EMBED_BACKUP/')

BACKUP ALL ON CLUSTER cluster TO S3('https://xxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com/EMBED_BACKUP/')

Query id: 8bce120e-15cc-4051-ae34-21c8fe3adf6a

Elapsed: 7.454 sec. 

Received exception from server (version 24.1.2):
Code: 647. DB::Exception: Received from localhost:9000. DB::Exception: Got error from chi%2Dsignoz%2Dtools%2Dcluster%2Dclickhouse%2Dcluster%2D1%2D0:9000. DB::Exception: Table signoz_logs.logs_v2 on replica chi-signoz-tools-cluster-clickhouse-cluster-1-0 has part 20240929_2_2_0 different from the part on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0 (checksum 'b87066065558b8e0f1790072f9d48853' on replica chi-signoz-tools-cluster-clickhouse-cluster-1-0 != checksum '80a4583914d9af71921f01fa326978ab' on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0). (CANNOT_BACKUP_TABLE)

upgrade your clickhouse-server version to 24.8

What is the motivation for?

Slach commented 1 month ago

ok. uuid the same, so replication works

let's check how many parts have the same name but different hashes

SELECT groupArray(h) AS all_hosts, name, database, table, groupArray(hash_of_all_files) AS all_hashes FROM (
 SELECT hostName() h, name, database, table, hash_of_all_files FROM cluster('all-sharded',system.parts) WHERE engine ILIKE '%Replicated%'`
)
GROUP BY name, database, table
HAVING length(all_hashes) > 1 
Slach commented 1 month ago

upgrade your clickhouse-server version to 24.8 What is the motivation for?

this is LTS release, hope it have more stable implementation for BACKUP

moreover, let's apply OPTIMIZE TABLE signoz_logs.logs_v2 ON CLUSTER '{cluster}' FINAL ? did you check mutation finished successful via SELECT hostName(), * FROM cluster('all-sharded',system.mutations) WHERE query ILIKE '%OPTIMIZE%FINAL%' FORMAT Vertical

jasondavindev commented 1 month ago

I'll close the issue because the initial problem was solved.

I using the clickhouse-backup instead of embeded backup.