`srcBucket` during backup of remote storage being set incorrectly

BryanFauble commented 18 hours ago

Hello! Looking for a little bit of help on this issue. I spent some time to look at the golang code and I didn't find any particular issues with how the code was parsing the clickhouse storage XML, or handling of the URL for the AWS S3 endpoint. I must have something incorrect in my settings. I hope you might be able to point me in the right direction.

I am deploying this to AWS EKS as a sidecar to a 2 shard clickhouse cluster. I have set up https://github.com/Altinity/clickhouse-backup/blob/master/Examples.md#how-to-use-aws-irsa-and-iam-to-allow-s3-backup-without-explicit-credentials to use a service account, and everything did work before I set up tiered storage with clickhouse to offload data into S3 after a period of time.

When we run any command (watch, or create) we are running into HTTP 403 access errors:

/var/lib/clickhouse/disks/s3/backup/shard0-full-20241118205408/shadow/signoz_metrics/samples_v4/s3 error: S3->CopyObject 
data/qxm/wwmercffxqfpvhzfonoyqxwpwkjke -> NAME_OF_MY_S3_BUCKET_REDACTED/data/shard0-full-20241118205408/s3/qxm/wwmercffxqfpvhzfonoyqxwpwkjke return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 2701EG7A4001V8C6, HostID: Mte3+g7wHhclqzZ0G3CQCwqrK1+oVAPwirepn6yZQE9zP/fJc4xrmiCVnNsmszGGCRfrDi07rhs=, api error AccessDenied: Access Denied

I double checked the IAM permissions and everything looks good there (I tested out and gave the IAM role full s3 permissions on the bucket). However, in the message this portion of the message was suspect: S3->CopyObject data/qxm/wwmercffxqfpvhzfonoyqxwpwkjke -> NAME_OF_MY_S3_BUCKET_REDACTED/data/shard0-full-20241118205408/s3/qxm/wwmercffxqfpvhzfonoyqxwpwkjke

The related golang code is: log.Debug().Msgf("S3->CopyObject %s/%s -> %s/%s", srcBucket, srcKey, s.Config.Bucket, dstKey)

This tells me that the value of srcBucket is being set incorrectly somewhere - And it's trying to copy objects from a bucket called data. I looked at the golang AWS SDK to check what was expected in these fields to confirm that it's expecting the bucket in the input to the copy function: https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#CopyObjectInput

I followed the code back up to where the bucket/urls are initially being set: https://github.com/Altinity/clickhouse-backup/blob/master/pkg/storage/object_disk/object_disk.go#L490-L511

That looked to be correct, we have region set, so it falls into the block for https://bucket-name.s3.amazonaws.com/. I went into the clickhouse storage config and verified the endpoint is properly set for the s3 disk:

<clickhouse>
  <storage_configuration>
    <disks>
      <!--
        default disk is special, it always exists even if not explicitly configured here,
        but you can't change it's path here (you should use <path> on top level config instead)
      -->
      <default>
        <!--
          You can reserve some amount of free space on any disk (including default) by adding
          keep_free_space_bytes tag.
        -->
        <keep_free_space_bytes>10485760</keep_free_space_bytes>
      </default>
      <s3>
        <type>s3</type>
        <endpoint>https://NAME_OF_MY_S3_BUCKET_REDACTED.s3.amazonaws.com/data/</endpoint>
        <use_environment_credentials>true</use_environment_credentials>
      </s3>
    </disks>
    <policies>
      <tiered>
        <volumes>
          <default>
            <disk>default</disk>
          </default>
          <s3>
            <disk>s3</disk>
            <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert>
            <prefer_not_to_merge>1</prefer_not_to_merge>
          </s3>
        </volumes>
        <move_factor>0</move_factor>
      </tiered>
    </policies>
  </storage_configuration>
</clickhouse>

I also had some other questions as I was unclear based on the documentation what s3.object_disk_path was supposed to represent. Do you have any more context on what this is and what it's relationship is to the tiered storage already in S3, the backup being created, and/or previous backups?

Thank you very much for your time! I am more than happy to grab more information as needed for any debugging.

My config file looks like:

general:
    remote_storage: s3
    max_file_size: 0
    backups_to_keep_local: 0
    backups_to_keep_remote: 3
    log_level: debug
    allow_empty_backups: true
    download_concurrency: 1
    upload_concurrency: 1
    upload_max_bytes_per_second: 0
    download_max_bytes_per_second: 0
    object_disk_server_side_copy_concurrency: 32
    allow_object_disk_streaming: false
    use_resumable_state: true
    restore_schema_on_cluster: ""
    upload_by_part: true
    download_by_part: true
    restore_database_mapping: {}
    restore_table_mapping: {}
    retries_on_failure: 3
    retries_pause: 5s
    watch_interval: 8h
    full_interval: 24h
    watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}
    sharded_operation_mode: ""
    cpu_nice_priority: 15
    io_nice_priority: idle
    rbac_backup_always: true
    rbac_conflict_resolution: recreate
    retriesduration: 5s
    watchduration: 8h0m0s
    fullduration: 24h0m0s
clickhouse:
    username: default
    password: ""
    host: localhost
    port: 9000
    disk_mapping: {}
    skip_tables:
        - system.*
        - INFORMATION_SCHEMA.*
        - information_schema.*
        - _temporary_and_external_tables.*
    skip_table_engines: []
    timeout: 30m
    freeze_by_part: false
    freeze_by_part_where: ""
    use_embedded_backup_restore: false
    embedded_backup_disk: ""
    backup_mutations: true
    restore_as_attach: false
    check_parts_columns: true
    secure: false
    skip_verify: false
    sync_replicated_tables: false
    log_sql_queries: true
    config_dir: /etc/clickhouse-server/
    restart_command: exec:systemctl restart clickhouse-server
    ignore_not_exists_error_during_freeze: true
    check_replicas_before_attach: true
    default_replica_path: /clickhouse/tables/{cluster}/{shard}/{database}/{table}
    default_replica_name: '{replica}'
    tls_key: ""
    tls_cert: ""
    tls_ca: ""
    max_connections: 1
    debug: false
s3:
    access_key: ""
    secret_key: ""
    bucket: NAME_OF_MY_S3_BUCKET_REDACTED
    endpoint: ""
    region: us-east-1
    acl: private
    assume_role_arn: ""
    force_path_style: false
    path: backup/shard-{shard}
    object_disk_path: backup-temp
    disable_ssl: false
    compression_level: 1
    compression_format: tar
    sse: ""
    sse_kms_key_id: ""
    sse_customer_algorithm: ""
    sse_customer_key: ""
    sse_customer_key_md5: ""
    sse_kms_encryption_context: ""
    disable_cert_verification: false
    use_custom_storage_class: false
    storage_class: STANDARD
    custom_storage_class_map: {}
    concurrency: 2
    part_size: 0
    max_parts_count: 4000
    allow_multipart_download: false
    object_labels: {}
    request_payer: ""
    check_sum_algorithm: ""
    debug: false
gcs:
    credentials_file: ""
    credentials_json: ""
    credentials_json_encoded: ""
    embedded_access_key: ""
    embedded_secret_key: ""
    skip_credentials: false
    bucket: ""
    path: ""
    object_disk_path: ""
    compression_level: 1
    compression_format: tar
    debug: false
    force_http: false
    endpoint: ""
    storage_class: STANDARD
    object_labels: {}
    custom_storage_class_map: {}
    client_pool_size: 32
    chunk_size: 0
cos:
    url: ""
    timeout: 2m
    secret_id: ""
    secret_key: ""
    path: ""
    object_disk_path: ""
    compression_format: tar
    compression_level: 1
    debug: false
api:
    listen: 0.0.0.0:7171
    enable_metrics: true
    enable_pprof: false
    username: ""
    password: ""
    secure: false
    certificate_file: ""
    private_key_file: ""
    ca_cert_file: ""
    ca_key_file: ""
    create_integration_tables: true
    integration_tables_host: ""
    allow_parallel: false
    complete_resumable_after_restart: true
    watch_is_main_process: false
ftp:
    address: ""
    timeout: 2m
    username: ""
    password: ""
    tls: false
    skip_tls_verify: false
    path: ""
    object_disk_path: ""
    compression_format: tar
    compression_level: 1
    concurrency: 3
    debug: false
sftp:
    address: ""
    port: 22
    username: ""
    password: ""
    key: ""
    path: ""
    object_disk_path: ""
    compression_format: tar
    compression_level: 1
    concurrency: 3
    debug: false
azblob:
    endpoint_schema: https
    endpoint_suffix: core.windows.net
    account_name: ""
    account_key: ""
    sas: ""
    use_managed_identity: false
    container: ""
    path: ""
    object_disk_path: ""
    compression_level: 1
    compression_format: tar
    sse_key: ""
    buffer_size: 0
    buffer_count: 3
    max_parts_count: 256
    timeout: 4h
    debug: false
custom:
    upload_command: ""
    download_command: ""
    list_command: ""
    delete_command: ""
    command_timeout: 4h
    commandtimeoutduration: 4h0m0s

Slach commented 14 hours ago

add <region>your-region</region>

to your storage_configuration

      <s3>
        <type>s3</type>
        <endpoint>https://NAME_OF_MY_S3_BUCKET_REDACTED.s3.amazonaws.com/data/</endpoint>
        <region>us-west-1</region>  
        <use_environment_credentials>true</use_environment_credentials>
      </s3>

change your backup config object_disk_path is not temporary

s3:
    path: backup/shard-{shard}
    object_disk_path: backup-object-disks/shard-{shard}

403 error means your clickhouse-backup credentials doesn't have access to NAME_OF_MY_S3_BUCKET_REDACTED

hope you have different buckets for s3 disk and backup bucket

as I see, you are use environment credentials which kind of environment? Do you use credentials explicitly or use ARN ROLE or use IRSA and serviceAccount?

could you share your kubectl get chi -n <your-namespace> <your-chi-name> -o yaml without sensitive credentials?

BryanFauble commented 36 minutes ago

add
your-region
to your storage_configuration

Thanks so much for the recommendation. This was the issue. We're using the SigNoz helm chart, and they don't supply a way to accomplish this via their chart/values (https://github.com/SigNoz/charts/blob/main/charts/clickhouse/templates/clickhouse-instance/clickhouse-instance.yaml#L60-L114). We're using FluxCD to handle post rendering the helm chart so we we're able to easily add in this section and replace their storage.xml definition: https://github.com/Sage-Bionetworks-Workflows/eks-stack/pull/47/files/35a0cc5d49388a89a629797fbf72d3e264573700..b6fca8b2726f3e37716e3b6a93491a06d7671773

hope you have different buckets for s3 disk and backup bucket

What is the motivation behind using different buckets for s3 tiered storage, and the backup bucket? My plan was to put then into different directories in the bucket so each clickhouse cluster only needs a single bucket.

as I see, you are use environment credentials which kind of environment? Do you use credentials explicitly or use ARN ROLE or use IRSA and serviceAccount?

I am using IRSA, but it wasn't the issue here.

Altinity / clickhouse-backup

`srcBucket` during backup of remote storage being set incorrectly #1046