exasol / cloud-storage-extension

Exasol Cloud Storage Extension for accessing formatted data Avro, Orc and Parquet, on public cloud storage systems
MIT License
7 stars 11 forks source link

Add support for endpoint region #216

Closed morazow closed 2 years ago

morazow commented 2 years ago

Situation

For the S3 endpoint from the VPC interface using the PrivateLink, we get the following Authorization Header is Malformed error:

com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'vpce' is wrong; expecting 'ca-central-1'
(Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: req-id; S3 Extended Request ID: req-id-2), S3 Extended Request ID: req-id-2:AuthorizationHeaderMalformed: The authorization
header is malformed; the region 'vpce' is wrong; expecting 'ca-central-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: req-id;

As listed on the hadoop-aws library, we should explicitly set the endpoint region.

For example:

<property>
  <name>fs.s3a.endpoint.region</name>
  <value>ca-central-1</value>
</property>

Usage will be as following:

IMPORT INTO CLOUD_STORAGE_EXTENSION.RETAIL
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
  BUCKET_PATH         = 's3a://retail-data/parquet/*.parquet'
  DATA_FORMAT         = 'PARQUET'
  S3_ENDPOINT         = 'https://bucket.vpce-1a2b3c4d-5e6f.s3.us-east-1.vpce.amazonaws.com'
  S3_ENDPOINT_REGION  = 'us-east-1'
  CONNECTION_NAME     = 'S3_CONNECTION';

Acceptance Criteria