Open BryanFauble opened 18 hours ago
add
<region>your-region</region>
to your storage_configuration
<s3>
<type>s3</type>
<endpoint>https://NAME_OF_MY_S3_BUCKET_REDACTED.s3.amazonaws.com/data/</endpoint>
<region>us-west-1</region>
<use_environment_credentials>true</use_environment_credentials>
</s3>
change your backup config object_disk_path
is not temporary
s3:
path: backup/shard-{shard}
object_disk_path: backup-object-disks/shard-{shard}
403 error means your clickhouse-backup
credentials
doesn't have access to NAME_OF_MY_S3_BUCKET_REDACTED
hope you have different buckets for s3 disk and backup bucket
as I see, you are use environment credentials which kind of environment? Do you use credentials explicitly or use ARN ROLE or use IRSA and serviceAccount?
could you share your
kubectl get chi -n <your-namespace> <your-chi-name> -o yaml
without sensitive credentials?
add
your-region to your storage_configuration
Thanks so much for the recommendation. This was the issue. We're using the SigNoz helm chart, and they don't supply a way to accomplish this via their chart/values (https://github.com/SigNoz/charts/blob/main/charts/clickhouse/templates/clickhouse-instance/clickhouse-instance.yaml#L60-L114). We're using FluxCD to handle post rendering the helm chart so we we're able to easily add in this section and replace their storage.xml definition: https://github.com/Sage-Bionetworks-Workflows/eks-stack/pull/47/files/35a0cc5d49388a89a629797fbf72d3e264573700..b6fca8b2726f3e37716e3b6a93491a06d7671773
hope you have different buckets for s3 disk and backup bucket
What is the motivation behind using different buckets for s3 tiered storage, and the backup bucket? My plan was to put then into different directories in the bucket so each clickhouse cluster only needs a single bucket.
as I see, you are use environment credentials which kind of environment? Do you use credentials explicitly or use ARN ROLE or use IRSA and serviceAccount?
I am using IRSA, but it wasn't the issue here.
Hello! Looking for a little bit of help on this issue. I spent some time to look at the golang code and I didn't find any particular issues with how the code was parsing the clickhouse storage XML, or handling of the URL for the AWS S3 endpoint. I must have something incorrect in my settings. I hope you might be able to point me in the right direction.
I am deploying this to AWS EKS as a sidecar to a 2 shard clickhouse cluster. I have set up https://github.com/Altinity/clickhouse-backup/blob/master/Examples.md#how-to-use-aws-irsa-and-iam-to-allow-s3-backup-without-explicit-credentials to use a service account, and everything did work before I set up tiered storage with clickhouse to offload data into S3 after a period of time.
When we run any command (watch, or create) we are running into HTTP 403 access errors:
I double checked the IAM permissions and everything looks good there (I tested out and gave the IAM role full
s3
permissions on the bucket). However, in the message this portion of the message was suspect:S3->CopyObject data/qxm/wwmercffxqfpvhzfonoyqxwpwkjke -> NAME_OF_MY_S3_BUCKET_REDACTED/data/shard0-full-20241118205408/s3/qxm/wwmercffxqfpvhzfonoyqxwpwkjke
The related golang code is:
log.Debug().Msgf("S3->CopyObject %s/%s -> %s/%s", srcBucket, srcKey, s.Config.Bucket, dstKey)
This tells me that the value of
srcBucket
is being set incorrectly somewhere - And it's trying to copy objects from a bucket calleddata
. I looked at the golang AWS SDK to check what was expected in these fields to confirm that it's expecting the bucket in the input to the copy function: https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#CopyObjectInputI followed the code back up to where the bucket/urls are initially being set: https://github.com/Altinity/clickhouse-backup/blob/master/pkg/storage/object_disk/object_disk.go#L490-L511
That looked to be correct, we have region set, so it falls into the block for
https://bucket-name.s3.amazonaws.com/
. I went into the clickhouse storage config and verified the endpoint is properly set for the s3 disk:I also had some other questions as I was unclear based on the documentation what
s3.object_disk_path
was supposed to represent. Do you have any more context on what this is and what it's relationship is to the tiered storage already in S3, the backup being created, and/or previous backups?Thank you very much for your time! I am more than happy to grab more information as needed for any debugging.
My config file looks like: