Closed joshbartley closed 3 months ago
Please check this doc: https://repost.aws/knowledge-center/s3-bucket-access-direct-connect
@tsolodov the second option of an VPC Interface doesn't work, first option is not feasible and the reason for this issue.. In the CLI you have to specify both the bucket name and the endpoint url as separate items.
If you use https://XXXXXXXXXXXXXXXX.vpce-000000000000000000-XXXXXXXXXX.s3.us-east-2.vpce.amazonaws.com/Clickhouse/XXXXXXX/Full
as the S3 endpoint which uses the Endpoint URL from your link you get the error below
2023.08.25 18:10:46.357763 [ 1600 ] {}
DNSResolver: Cannot resolve host (s3.us-east-2.vpce.amazonaws.com), error 0: Host not found.
Because https://github.com/ClickHouse/ClickHouse/blob/32efbe77d1ba48291d90885b11e6f1840c4158db/src/IO/S3/URI.cpp
has a regex that strips the VPC Endpoint out and tries to connect to s3.us-east-2.vpce.amazonaws.com
which doesn't exist.
Unless I am missing something, this will be fixed by https://github.com/ClickHouse/ClickHouse/pull/62208.
I have only performed tests using simple queries to s3 with select * from s3(vpce_endpoint)...
Just tested table engine, backups, incremental backups and restore. It is working as expected.
Use case You have an on-premise Clickhouse server and have either an AWS Direct Connect connection or IPSec VPN to an AWS VPC.
Describe the solution you'd like When specifying the AWS S3 bucket location details, ability to include an Endpoint URL to support VPC Interface Endpoints. Uses would include S3 backups, S3 table engine, S3 file load, S3 Restore.
Describe alternatives you've considered AWS VPC Gateway Endpoints AWS VPC Gateway endpoints support accessing S3 directly but does not support Direct Connect or IPSec tunnels without using a public ipv4 /24 to setup the route. Because of the IPv4 /24 requirement, this is highly not recommended.
[Gateway] Endpoint connections cannot be extended out of a VPC. Resources on the other side of a VPN connection, VPC peering connection, transit gateway, or AWS Direct Connect connection in your VPC cannot use a gateway endpoint to communicate with Amazon S3.
https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.htmlAWS VPC Interface Endpoints are a private IP in the VPC which works over Direct Connect and IPSec tunnels without the need to use public IPv4 routing to access. https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html