Closed jasondavindev closed 1 month ago
Another test
chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml tables
signoz_metrics.samples_v2 11.97GiB default full
signoz_metrics.samples_v4 10.01GiB default,s3 full
chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml create_remote partial
2024-10-10 21:55:48.653 INF pkg/backup/create.go:170 > done createBackupRBAC size=0B
2024-10-10 21:55:48.925 WRN pkg/backup/backuper.go:118 > MAX_FILE_SIZE=1073741824 is less than actual 17035327904, please remove general->max_file_size section from your config
2024-10-10 21:55:49.845 INF pkg/backup/create.go:324 > done progress=9/215 table=signoz_metrics.samples_v2
2024-10-10 21:55:50.179 INF pkg/backup/create.go:324 > done progress=10/215 table=signoz_metrics.samples_v4
2024-10-10 21:55:50.197 INF pkg/backup/create.go:336 > done duration=2.128s operation=createBackupLocal version=2.6.2
2024-10-10 21:57:27.083 INF pkg/backup/upload.go:171 > done duration=1m36.326s operation=upload_data progress=2/2 size=10.01GiB table=signoz_metrics.samples_v4 version=2.6.2
2024-10-10 21:57:36.590 INF pkg/backup/upload.go:171 > done duration=1m45.832s operation=upload_data progress=1/2 size=11.97GiB table=signoz_metrics.samples_v2 version=2.6.2
2024-10-10 21:57:36.632 INF pkg/backup/upload.go:240 > done backup=partial duration=1m46.434s object_disk_size=0B operation=upload upload_size=21.98GiB version=2.6.2
2024-10-10 21:57:37.056 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3_backup/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:166 > done backup=partial duration=496ms location=local operation=delete
The previous warning is not shown (the following log is from my previous post)
2024-10-10 21:40:38.196 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3_backup doesn't contains <access_key_id> and <secret_access_key> environment variables will use
2024-10-10 21:40:38.200 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3 doesn't contains <access_key_id> and <secret_access_key> environment variables will use
Thanks for the detailed report
Did you setup AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
inside clickhouse-backup
container?
try --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY
or --env AWS_ROLE_ARN
Could you share your current pod manifest with replace sensitive credentials to XXX?
kubectl -n <your-namespace> chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 -o yaml
When you use IRSA
, which serviceAccount
do you use?
In this case, serviceAccount
mounts into pod and some environment variables injected into env
section.
path: "" object_disk_path: "backups/"
better to replace it
path: "backups"
object_disk_path: "object_disks_backups"
Warning and error will show only if you have data parts in s3 disk
Did you setup AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY inside clickhouse-backup container?
I running clickhouse-backup bin inside clickhouse-server container. The used service account works for normal clickhouse-server workloads (s3 disk as cold storage) with s3 full access
try --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY or --env AWS_ROLE_ARN
I tried but it didnt work
path: "" object_disk_path: "backups/" better to replace it
I changed path but no changes in s3 structure, like the config was ignored
I using SigNoz helm chart with clickhouse dependency 3 shards and 1 replica per shard
Clickhouse pod generated manifest
apiVersion: v1
kind: Pod
metadata:
annotations:
signoz.io/path: /metrics
signoz.io/port: "9363"
signoz.io/scrape: "true"
labels:
app.kubernetes.io/component: clickhouse
app.kubernetes.io/instance: signoz-tools-cluster
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: clickhouse
app.kubernetes.io/version: 24.1.2
apps.kubernetes.io/pod-index: "0"
argocd.argoproj.io/instance: signoz-tools-cluster
clickhouse.altinity.com/app: chop
clickhouse.altinity.com/chi: signoz-tools-cluster-clickhouse
clickhouse.altinity.com/cluster: cluster
clickhouse.altinity.com/namespace: signoz
clickhouse.altinity.com/ready: "yes"
clickhouse.altinity.com/replica: "0"
clickhouse.altinity.com/shard: "0"
helm.sh/chart: clickhouse-24.1.6
statefulset.kubernetes.io/pod-name: chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
name: chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
namespace: signoz
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: chi-signoz-tools-cluster-clickhouse-cluster-0-0
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- zookeeper
- clickhouse
topologyKey: kubernetes.io/hostname
containers:
- command:
- /bin/bash
- -c
- /usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
env:
- name: AWS_STS_REGIONAL_ENDPOINTS
value: regional
- name: AWS_DEFAULT_REGION
value: us-east-1
- name: AWS_REGION
value: us-east-1
- name: AWS_ROLE_ARN
value: arn:aws:iam::xxxxxxxxxxx:role/ClickhouseEKSRole
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
image: xxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/docker-hub/clickhouse/clickhouse-server:24.1.2-alpine
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: clickhouse
ports:
- containerPort: 8123
name: http
protocol: TCP
- containerPort: 9000
name: client
protocol: TCP
- containerPort: 9009
name: interserver
protocol: TCP
- containerPort: 9000
name: tcp
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "4"
memory: 12Gi
requests:
cpu: "3"
memory: 8Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/clickhouse
name: data-volumeclaim-template
- mountPath: /var/lib/clickhouse/user_scripts
name: shared-binary-volume
- mountPath: /etc/clickhouse-server/functions
name: custom-functions-volume
- mountPath: /etc/clickhouse-server/config.d/
name: chi-signoz-tools-cluster-clickhouse-common-configd
- mountPath: /etc/clickhouse-server/users.d/
name: chi-signoz-tools-cluster-clickhouse-common-usersd
- mountPath: /etc/clickhouse-server/conf.d/
name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-hn6tq
readOnly: true
- mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
name: aws-iam-token
readOnly: true
initContainers:
- command:
- sh
- -c
- |
set -x
wget -O /tmp/histogramQuantile https://github.com/SigNoz/signoz/raw/develop/deploy/docker/clickhouse-setup/user_scripts/histogramQuantile
mv /tmp/histogramQuantile /var/lib/clickhouse/user_scripts/histogramQuantile
chmod +x /var/lib/clickhouse/user_scripts/histogramQuantile
env:
- name: AWS_STS_REGIONAL_ENDPOINTS
value: regional
- name: AWS_DEFAULT_REGION
value: us-east-1
- name: AWS_REGION
value: us-east-1
- name: AWS_ROLE_ARN
value: arn:aws:iam::xxxxxxxxxxx:role/ClickhouseEKSRole
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
image: docker.io/alpine:3.18.2
imagePullPolicy: IfNotPresent
name: signoz-tools-cluster-clickhouse-udf-init
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/clickhouse/user_scripts
name: shared-binary-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-hn6tq
readOnly: true
- mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
name: aws-iam-token
readOnly: true
nodeSelector:
karpenter.sh/capacity-type: on-demand
karpenter.sh/provisioner-name: observability-stack-provisioner
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 101
fsGroupChangePolicy: OnRootMismatch
runAsGroup: 101
runAsUser: 101
serviceAccount: signoz-tools-cluster-clickhouse
serviceAccountName: signoz-tools-cluster-clickhouse
subdomain: chi-signoz-tools-cluster-clickhouse-cluster-0-0
terminationGracePeriodSeconds: 30
tolerations:
- key: ObservabilityStackOnly
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: aws-iam-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: sts.amazonaws.com
expirationSeconds: 86400
path: token
- name: data-volumeclaim-template
persistentVolumeClaim:
claimName: data-volumeclaim-template-chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
- emptyDir: {}
name: shared-binary-volume
- configMap:
defaultMode: 420
name: signoz-tools-cluster-clickhouse-custom-functions
name: custom-functions-volume
- configMap:
defaultMode: 420
name: chi-signoz-tools-cluster-clickhouse-common-configd
name: chi-signoz-tools-cluster-clickhouse-common-configd
- configMap:
defaultMode: 420
name: chi-signoz-tools-cluster-clickhouse-common-usersd
name: chi-signoz-tools-cluster-clickhouse-common-usersd
- configMap:
defaultMode: 420
name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
- name: kube-api-access-hn6tq
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
I changed /var/lib/clickhouse/preprocessed_configs/config.xml
file adding aws credentials and the warning is not shown, but access denied error remains
2024-10-11 14:40:56.684 INF pkg/backup/create.go:170 > done createBackupRBAC size=0B
2024-10-11 14:40:56.735 WRN pkg/backup/backuper.go:118 > MAX_FILE_SIZE=1073741824 is less than actual 17035327904, please remove general->max_file_size section from your config
2024-10-11 14:41:14.253 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied table=signoz_logs.logs
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_index_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.durationSort
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_spans
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4_agg_5m
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4_agg_30m
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_6hrs
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_1day
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.tag_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_1week
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.span_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.dependency_graph_minutes_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.dependency_graph_minutes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_error_index_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_v2_resource
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_v2
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v2
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_tag_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_tag_attributes
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_v2_resource
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_v2_resource
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_span_attributes
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_span_attributes
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_analytics.rule_state_history
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.usage_explorer
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v2
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v2
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.top_level_operations
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.span_attributes_keys
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_attribute_keys
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_resource_keys
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.schema_migrations
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.schema_migrations
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.schema_migrations
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4_agg_30m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4_agg_30m
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4_agg_5m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4_agg_5m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.resource_keys_string_final_mv
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v3
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v3
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_usage
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_1day
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_1day
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_1week
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_1week
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_6hrs
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_6hrs
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.exp_hist
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_resource_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_resource_keys
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_attribute_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_attribute_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_string_final_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.samples_v4_agg_30m_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_float64_final_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.samples_v4_agg_5m_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_bool_final_mv
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_analytics.distributed_rule_state_history
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_analytics.distributed_rule_state_history
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v3
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_1day_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_1week_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_6hrs_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_db_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_db_calls_mv_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_messaging_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_messaging_calls_mv_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_service_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_service_calls_mv_v2
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_dependency_graph_minutes
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_dependency_graph_minutes
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_dependency_graph_minutes_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_dependency_graph_minutes_v2
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_durationSort
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_durationSort
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_error_index_v2
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_error_index_v2
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_index_v2
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_index_v2
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_spans
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_spans
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_span_attributes_keys
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_span_attributes_keys
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_top_level_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_top_level_operations
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_usage
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_usage
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_usage_explorer
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_usage_explorer
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.durationSortMV
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.root_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_error_index
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_index
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.sub_root_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.usage_explorer_mv
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_exp_hist
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_exp_hist
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:139 > backup failed error: one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied
2024-10-11 14:41:14.525 INF pkg/backup/delete.go:185 > cleanBackupObjectDisks deleted 0 keys backup=2024-10-11-remote2 duration=35ms
2024-10-11 14:41:14.525 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/backup/2024-10-11-remote2'
2024-10-11 14:41:14.613 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3_backup/backup/2024-10-11-remote2'
2024-10-11 14:41:14.613 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2'
2024-10-11 14:41:14.618 INF pkg/backup/delete.go:166 > done backup=2024-10-11-remote2 duration=359ms location=local operation=delete
2024-10-11 14:41:14.733 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/shadow
2024-10-11 14:41:14.733 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/disks/s3_backup/shadow
2024-10-11 14:41:14.741 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/disks/s3/shadow
2024-10-11 14:41:14.741 FTL cmd/clickhouse-backup/main.go:658 > error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied"
Note: IAM Role has s3 full access and the AWS credentials is for my aws user (admin access)
See the code block here
Is the srcBucket
variable empty?
Compare the output log
error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/rky/guvjhazneieklouevfhijqiaduqlk -> my-bucket/backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: AS1YYQCZF4KJ8QZY, HostID: /8ntBN2alKtBkXTy9YcODvCAnEb/bDf8KbJH1mOL0OlTJwChCkH3bysFHih4k9x+cVqKOST3Pd0=, api error AccessDenied: Access Denied"
S3->CopyObject data2/rky/guvjhazneieklouevfhijqiaduqlk -> my-bucket/backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk
/\
||
The log shown only key but not the source bucket
The s3 logs
2024-10-11 15:16:27.034 INF pkg/storage/s3.go:49 > [s3:DEBUG] Request
GET /?versioning= HTTP/1.1
Host: data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com
User-Agent: m/F aws-sdk-go-v2/1.30.5 os/linux lang/go#1.22.7 md/GOOS#linux md/GOARCH#arm64 api/s3#1.61.2
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 338f4e3a-00cd-4f38-a096-ea36114e0b97
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=**********/20241011/xxxxxxxxxxxxx-cold-storage-tools/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=xxxxxxxxxxx
X-Amz-Content-Sha256: xxxxxxxxxxx
X-Amz-Date: 20241011T151627Z
2024-10-11 15:16:27.051 INF pkg/storage/s3.go:49 > [s3:DEBUG] request failed with unretryable error https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "https://data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com/?versioning=": dial tcp: lookup data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com on 10.205.0.10:53: no such host
2024-10-11 15:16:27.071 INF pkg/storage/s3.go:49 > [s3:DEBUG] Request
PUT /backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk?x-id=CopyObject HTTP/1.1
Host: xxxxxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com
User-Agent: m/F aws-sdk-go-v2/1.30.5 os/linux lang/go#1.22.7 md/GOOS#linux md/GOARCH#arm64 api/s3#1.61.2
Content-Length: 0
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 3736572a-701b-4537-978c-7d8d3b1d54e5
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=**********/20241011/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-security-token;x-amz-storage-class, Signature=xxxxxxxxxxx
X-Amz-Content-Sha256: xxxxxxxxxxx
X-Amz-Copy-Source: data2/rky/guvjhazneieklouevfhijqiaduqlk
X-Amz-Date: 20241011T151627Z
X-Amz-Security-Token: xxxxxxxxxxx
X-Amz-Storage-Class: STANDARD
The bucket key path (data2/
) is inside s3 host Host: data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com
Is correct??
xxxxxxxxxxxxx-cold-storage-tools
is the s3 disk set in config.xml
The s3 endpoint string format was wrong.
I changed https://xxxxxx-storage-test-tools.s3.amazonaws.com/data/
to https://xxxxxx-storage-test-tools.s3.us-east-1.amazonaws.com/data/
and work
Is image: xxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/docker-hub/clickhouse/clickhouse-server:24.1.2-alpine
contains clickhouse-backup
binary?
No. I installed clickhouse-backup bin manually in the container
Unfortunatelly, https://github.com/SigNoz/charts/blob/main/charts/clickhouse/templates/clickhouse-instance/clickhouse-instance.yaml#L202 doesn't allow run second container with clickhouse-backup
in this case i would like to propose use standard BACKUP and RESTORE sql commands which available with modern clickhouse-server version
look details in https://clickhouse.com/docs/en/operations/backup
you can just create kind: CronJob
which will just execute something like
clickhouse-client -h chi...-0-0 --user ... --password ... -q "BACKUP ALL ON CLUSTER '{cluster}' TO S3(...)"
and for restore kind: Job
which will just execute something like
clickhouse-client -h chi...-0-0-0 --user ... --password ... -q "BACKUP ALL ON CLUSTER '{cluster}' TO S3(...)"
I think to fork the chart and customize to provide side cars containers to clickhouse-server.
About embbeded backup suggestions, I tried but the backup fails for clustered workload and I use clickhouse-backup for this.
Another option is run a cronjob that connects to clickhouse-server pod through kubectl command and runs backup in it.
which failure do you have with BACKUP ALL ON CLUSTER ?
did you check SELECT * FROM system.backup_log
?
When backing up using ON CLUSTER flag I must do many parts synchronization and we do not have deep knowledge about this. We are new clickhouse users and we are learning about it. clickhouse-backup
has deep managing features and I prefer it.
Before clickhouse BACKUP/RESTORE features, we used velero. But on recovery steps, we have too many parts
and other errors to handle.
I think, too many parts
is not related to used backup tool ;)
this is usually related to wrong INSERT pattern and insert rows batch size which produces a lot of small data parts
BACKUP ALL .. ON CLUSTER
should work very well in clickhouse-server:24.8
on cluster means upload parts to s3 will just spread between replicas inside shard not so much parts sync as you think
Example error
Received exception from server (version 24.1.2):
Code: 647. DB::Exception: Received from localhost:9000. DB::Exception: Got error from chi%2Dsignoz%2Dtools%2Dcluster%2Dclickhouse%2Dcluster%2D2%2D0:9000. DB::Exception: Table signoz_logs.logs_v2 on replica chi-signoz-tools-cluster-clickhouse-cluster-0-0 has part 20240927_1_1_0 different from the part on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0 (checksum '5d2c4cb2a3959b040da2e13c398090fb' on replica chi-signoz-tools-cluster-clickhouse-cluster-0-0 != checksum 'a234ebfd4d43dbb6639eccbb5e286882' on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0). (CANNOT_BACKUP_TABLE)
When we changed from 1 shard to 2 shards, the organic replication was used and no manually steps was did. I dont know if later steps are necessary.
has part 20240927_1_1_0 different from the part on replica When we changed from 1 shard to 2 shards, the organic replication was used
Hm, could you share
SELECT hostName(), engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'
To fix your issue, i would like propse to run
kubectl exec chi-signoz-tools-cluster-clickhouse-cluster-0-0 -- clickhouse-client --receive-timeout=86400 -q "OPTIMIZE TABLE signoz_logs.logs_v2 PARTITION 20240927 FINAL"
and try BACKUP again
I ran the OPTIMIZE TABLE command for above partition and for each BACKUP execution I needed run the OPTIMIZE TABLE command for that partition. Finally, I ran for all partitions (no PARTITION arg in OPTIMIZE command), but still mismatch part error is shown.
Curious is that for some OPTIMIZE executions some errors were shown:
Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [chi-signoz-tools-cluster-clickhouse-cluster-2-0:9000]: Code: 53. DB::Exception: Type mismatch in IN or VALUES section. Expected: Date. Got: Float64. (TYPE_MISMATCH) (version 24.1.2.5 (official build)). (TYPE_MISMATCH)
Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: Type mismatch in IN or VALUES section. Expected: Date. Got: Float64. (TYPE_MISMATCH)
Hm, could you share
SELECT hostName(), engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 │ ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-2-0-0 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-1-0-0 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Did you receive errors above when executing BACKUP
command or something else?
Could you share full stacktrace in this case?
Moreover, les's compare uuid
SELECT hostName(), uuid, engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'
upgrade your clickhouse-server version to 24.8
SELECT
hostName(),
uuid,
engine_full
FROM cluster('all-sharded', system.tables)
WHERE (database = 'signoz_logs') AND (table = 'logs_v2')
┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-2-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-1-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
yes, the above shown error came from BACKUP command
chi-signoz-tools-cluster-clickhouse-cluster-0-0-0.chi-signoz-tools-cluster-clickhouse-cluster-0-0.signoz.svc.cluster.local :) BACKUP ALL ON CLUSTER 'cluster' TO S3('https://xxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com/EMBED_BACKUP/')
BACKUP ALL ON CLUSTER cluster TO S3('https://xxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com/EMBED_BACKUP/')
Query id: 8bce120e-15cc-4051-ae34-21c8fe3adf6a
Elapsed: 7.454 sec.
Received exception from server (version 24.1.2):
Code: 647. DB::Exception: Received from localhost:9000. DB::Exception: Got error from chi%2Dsignoz%2Dtools%2Dcluster%2Dclickhouse%2Dcluster%2D1%2D0:9000. DB::Exception: Table signoz_logs.logs_v2 on replica chi-signoz-tools-cluster-clickhouse-cluster-1-0 has part 20240929_2_2_0 different from the part on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0 (checksum 'b87066065558b8e0f1790072f9d48853' on replica chi-signoz-tools-cluster-clickhouse-cluster-1-0 != checksum '80a4583914d9af71921f01fa326978ab' on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0). (CANNOT_BACKUP_TABLE)
upgrade your clickhouse-server version to 24.8
What is the motivation for?
ok. uuid the same, so replication works
let's check how many parts have the same name but different hashes
SELECT groupArray(h) AS all_hosts, name, database, table, groupArray(hash_of_all_files) AS all_hashes FROM (
SELECT hostName() h, name, database, table, hash_of_all_files FROM cluster('all-sharded',system.parts) WHERE engine ILIKE '%Replicated%'`
)
GROUP BY name, database, table
HAVING length(all_hashes) > 1
upgrade your clickhouse-server version to 24.8 What is the motivation for?
this is LTS release, hope it have more stable implementation for BACKUP
moreover, let's apply
OPTIMIZE TABLE signoz_logs.logs_v2 ON CLUSTER '{cluster}' FINAL
?
did you check mutation finished successful via
SELECT hostName(), * FROM cluster('all-sharded',system.mutations) WHERE query ILIKE '%OPTIMIZE%FINAL%' FORMAT Vertical
I'll close the issue because the initial problem was solved.
I using the clickhouse-backup instead of embeded backup.
ClickHouse server version 24.1.2.5 ClickHouse backup version: 2.6.2
In my clickhouse setup I set
use_environment_credentials
to true for s3 disk, but when using remote backup it cannot uses service account credentialsThe following warn is shown
And following error is shown
My config
For test purposes I selected just 1 table for backup and it works
But when I selected a tables set the AccessDenied is shown
Output of successfully backup (wrote to s3)
The following selected tables that backup does not working
When I selected just
signoz_metrics.samples_v4
that containts data on local disk and remote (s3) the backup was successfully.Note: my IAM role has full access on s3