ClickHouse / clickhouse-docs

Official documentation for ClickHouse
https://clickhouse.com/docs
Other
108 stars 254 forks source link

Explain how to configure ClickHouse on S3, with some potential caveats. #1385

Open alexey-milovidov opened 1 year ago

alexey-milovidov commented 1 year ago

Set up TTL for incomplete multipart uploads. Do not set up a lifecycle policy for object deletion. Do not set up object versioning. Do not set up bucket replication. Do not use FUSE over S3. Do not point multiple shards to the same prefix in S3. Do not delete objects on S3 manually. Use separate buckets for ClickHouse and for other workloads. How to properly configure the S3 bucket policy to allow the needed operations. How to properly configure the instance IAM profiles. How KMS affects performance. Multi-region vs. single region buckets in GCS. Choosing the proper region for a bucket. Caveats of using R2 and B2. Caveats of using Minio. Caveats of using JuiceFS. The need for a VPC gateway for S3.

alifirat commented 11 months ago

I'll also to recommend to add a daily check about your bucket size and the amount of data referenced by ClickHouse.

On the AWS, you can use CloudWatch to monitor your bucket size (it's published once) and compare to it to the system.parts table.

ddorian commented 1 month ago

Caveats of using R2 and B2.

Do we have any more info on this?

alexey-milovidov commented 1 month ago

@ddorian

Cloudflare R2 requires strict part upload size. It is possible to configure but non-obvious. Its performance will be 2..4 times worse in general if you request from a close region. When you use it from an AWS machine, performance will be even worse. But it can be faster if you request from a distant region.

Server-side copy for backups (from R2 to R2) will likely not work due to the lack of UploadPartCopy, and fallback to client-side transfer.

But it works.

ddorian commented 1 month ago

Cloudflare R2 requires strict part upload size. It is possible to configure but non-obvious.

Can you be more explicit here? Do you mean this note from https://developers.cloudflare.com/r2/objects/multipart-objects/:

Object part sizes must be at least 5MiB but no larger than 5GiB. All parts except the last one must be the same size. The last part has no minimum size, but must be the same or smaller than the other parts. Most S3 clients conform to these expectations.


Server-side copy for backups (from R2 to R2) will likely not work due to the lack of UploadPartCopy, and fallback to client-side transfer.

Looks like they do support it https://developers.cloudflare.com/r2/api/s3/api/. Maybe CH uses a subfeature they don't support?

alexey-milovidov commented 1 month ago

All parts except the last one must be the same size.