ConduitIO / conduit-connector-s3

Conduit connector for Amazon S3
Apache License 2.0
8 stars 3 forks source link
conduit go golang s3

Conduit Connector S3

General

scarf pixel The S3 connector is one of Conduit builtin plugins. It provides both, a source and a destination S3 connectors.

How to build it

Run make.

Testing

Run make test to run all the tests. You must set the environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) before you run all the tests. If not set, the tests that use these variables will be ignored.

S3 Source

The S3 Source Connector connects to a S3 bucket with the provided configurations, using aws.bucket, aws.accessKeyId,aws.secretAccessKey and aws.region. Then will call Configure to parse the configurations and make sure the bucket exists, If the bucket doesn't exist, or the permissions fail, then an error will occur. After that, the Open method is called to start the connection from the provided position.

Change Data Capture (CDC)

This connector implements CDC features for S3 by scanning the bucket for changes every pollingPeriod and detecting any change that happened after a certain timestamp. These changes (update, delete, create) are then inserted into a buffer that is checked on each Read request.

Position Handling

The connector goes through two modes.

Record Keys

The S3 object key uniquely identifies the objects in an Amazon S3 bucket, which is why a record key is the key read from the S3 bucket.

Configuration

The config passed to Configure can contain the following fields.

name description required example
aws.accessKeyId AWS access key id yes "THE_ACCESS_KEY_ID"
aws.secretAccessKey AWS secret access key yes "SECRET_ACCESS_KEY"
aws.region the AWS S3 bucket region yes "us-east-1"
aws.bucket the AWS S3 bucket name yes "bucket_name"
pollingPeriod polling period for the CDC mode, formatted as a time.Duration string. default is "1s" no "2s", "500ms"
prefix the key prefix for S3 source no "conduit-"

Known Limitations

S3 Destination

The S3 Destination Connector connects to an S3 bucket with the provided configurations, using aws.bucket, aws.accessKeyId,aws.secretAccessKey and aws.region. Then will call Configure to parse the configurations, If parsing was not successful, then an error will occur. After that, the Open method is called to start the connection. If the permissions fail, the connector will not be ready for writing to S3.

Writer

The S3 destination writer has a buffer with the size of bufferSize, for each time Write is called, a new record is added to the buffer. When the buffer is full, all the records from it will be written to the S3 bucket, and an ack function will be called for each record after being written.

Configuration

The config passed to Configure can contain the following fields.

name description required example
aws.accessKeyId AWS access key id yes "THE_ACCESS_KEY_ID"
aws.secretAccessKey AWS secret access key yes "SECRET_ACCESS_KEY"
aws.region the AWS S3 bucket region yes "us-east-1"
aws.bucket the AWS S3 bucket name yes "bucket_name"
format the destination format, either "json" or "parquet" yes "json"
prefix the key prefix for S3 destination no "conduit-"