bergwerf / cratedb_gcs

CrateDB Google Cloud Storate Integration
0 stars 0 forks source link

Unable to add LocalStack as S3 repository to CrateDB #1

Closed bergwerf closed 1 year ago

bergwerf commented 1 year ago
make crate_db_run
make crate_ui_run
make mock_aws_run
make mock_aws_s3_create_bucket
CREATE REPOSITORY awslocal TYPE s3 WITH (
  endpoint='http://localhost:4566',
  protocol='http',
  bucket='crate-bucket',
  base_path='my_base_path',
  access_key='my_access_key',
  secret_key='my_secret_key',
  compress=true
);
RepositoryVerificationException[
[awslocal] Unable to verify the repository,
[awslocal] is not accessible on master node:
AmazonS3Exception 'The specified bucket does not exist
(Service: Amazon S3;
Status Code: 404;
Error Code: NoSuchBucket;
Request ID: xxx;
S3 Extended Request ID: null;
Proxy: null)']
bergwerf commented 1 year ago

Possibly related: https://github.com/localstack/localstack/issues/7514

bergwerf commented 1 year ago

Adding LS_LOG=trace gives the following details:

AWS s3.PutObject => 404 (NoSuchBucket);
PutObjectRequest({
    'ACL': 'private',
    'Bucket': 'tests-3n0i79ZTSrmFVPhx12R8LQ',
    'CacheControl': None,
    'ContentDisposition': None,
    'ContentEncoding': None,
    'ContentLanguage': None,
    'ContentLength': 195,
    'ContentMD5': None,
    'ContentType': 'application/octet-stream',
    'ChecksumAlgorithm': None,
    'ChecksumCRC32': None,
    'ChecksumCRC32C': None,
    'ChecksumSHA1': None,
    'ChecksumSHA256': None,
    'Expires': None,
    'GrantFullControl': None,
    'GrantRead': None,
    'GrantReadACP': None,
    'GrantWriteACP': None,
    'Key': 'master.dat',
    'Metadata': {},
    'ServerSideEncryption': None,
    'StorageClass': 'STANDARD',
    'WebsiteRedirectLocation': None,
    'SSECustomerAlgorithm': None,
    'SSECustomerKey': None,
    'SSECustomerKeyMD5': None,
    'SSEKMSKeyId': None,
    'SSEKMSEncryptionContext': None,
    'BucketKeyEnabled': None,
    'RequestPayer': None,
    'Tagging': None,
    'ObjectLockMode': None,
    'ObjectLockRetainUntilDate': None,
    'ObjectLockLegalHoldStatus': None,
    'ExpectedBucketOwner': None,
    'Body': <_io.BufferedReader>
}, headers={
    'Host': 'crate-bucket.localhost:4566',
    'amz-sdk-invocation-id': '725b8f05-6e2d-01ff-5c72-4bb2da6a3e17',
    'amz-sdk-request': 'attempt=1;max=4',
    'amz-sdk-retry': '0/0/500',
    'Authorization': 'AWS4-HMAC-SHA256 Credential=test/20230520/us-east-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;amz-sdk-retry;content-length;content-type;host;user-agent;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-storage-class, Signature=25beeaeb561ee449babaf261b4b47c9a86ffe3cdf3bd72f74dc8c61a6f231a95',
    'Content-Type': 'application/octet-stream',
    'User-Agent': 'aws-sdk-java/1.12.353 Linux/5.10.0-21-amd64 OpenJDK_64-Bit_Server_VM/20.0.1+9 java/20.0.1 vendor/Eclipse_Adoptium cfg/retry-mode/legacy',
    'x-amz-acl': 'private',
    'x-amz-content-sha256': 'STREAMING-AWS4-HMAC-SHA256-PAYLOAD',
    'X-Amz-Date': '20230520T155139Z',
    'x-amz-decoded-content-length': '22',
    'x-amz-storage-class': 'STANDARD',
    'Content-Length': '195',
    'Connection': 'Keep-Alive',
    'Expect': '100-continue',
    'x-localstack-tgt-api': 's3',
    'x-moto-account-id': '000000000000'
});
NoSuchBucket(The specified bucket does not exist, headers={
    'Content-Type': 'application/xml',
    'Content-Length': '245',
    'x-amz-request-id': '47294c3a-6d9c-4845-bc8c-4667825f36d9',
    'x-amz-id-2': 's9lzHYrFp76ZVxRcpX9+5cjAnEH2ROuNkd2BHfIa6UkFVdtjf5mKR3/eTPFvsiP/XV/VLi31234='
}
bergwerf commented 1 year ago

It appears the CrateDB S3 plugin sends the wrong bucket name.

bergwerf commented 1 year ago

By setting rootLogger.level = debug in sandbox/crate/config/log4j2.properties I was able to confirm the following:

[2023-05-20T18:20:28,389][DEBUG][o.e.r.s.S3Repository     ] [Ilmenspitze] using bucket [crate-bucket], chunk_size [1gb], server_side_encryption [false], buffer_size [100mb], cannedACL [], storageClass []

Which originates from https://github.com/bergwerf/cratedb/blob/master/plugins/es-repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java#L136.

bergwerf commented 1 year ago

Using Wireshark and some documentation digging I was able to determine that there is likely a mismatch between the Java AWS client library and Localstack.

The Java API used by CrateDB to send a PutObject to S3 is: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/PutObjectRequest.html#PutObjectRequest-java.lang.String-java.lang.String-java.io.InputStream-com.amazonaws.services.s3.model.ObjectMetadata-

Here it is specified that:

When using this API with an access point, you must direct requests to the access point hostname.
The access point hostname takes the form AccessPointName-AccountId.s3-accesspoint.Region.amazonaws.com.

E.g. the endpoint should be a virtual host name: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html

This is also supported by LocalStack, but (obviously) using a different top-level hostname: https://docs.localstack.cloud/user-guide/aws/s3/

In Java, the following error is produced:

IllegalArgumentException 'Endpoint does not contain a valid host name: http://crate-bucket.s3.us‑east‑2.localhost.localstack.cloud'

I suspect this originates from the AWS Java client library com.amazonaws.services.s3.model.PutObjectRequest. Hence there is no straightforward way to fix this, and it will unfortunately not be possible to investigate the CrateDB demo extension via a local S3 mock server.