estuary / connectors

Connectors for capturing data from external data sources
Other
38 stars 10 forks source link

materialize-s3-iceberg: new connector #1658

Closed williamhbaker closed 1 week ago

williamhbaker commented 3 weeks ago

Description:

New connector for materializing Iceberg tables to S3 with parquet as the file format, exclusively using the AWS Glue catalog in delta-updates mode with at-least-once (although usually exactly-once) processing.

See commit messages for a little more detail. This is an initial version which works well, but I'm not super happy with the Go <-> Python interactions and would like to move the implementation to be more/exclusively Python, but that would take a lot more work to re-implement various common behaviors we already have for Go materializations. I think this will come as we build capabilities into the Iceberg connector, such as support for additional catalogs, true exactly-once semantics, and a standard updates mode of operation.

Complete functionality of this materialization requires the v2 control plane changes to be deployed, since I've assumed the last-validated spec to be available in Validate calls, so that I didn't have to implement any kind of metadata storage. Validate will still mostly work without the v2 control plane changes, but will always produce constraints for fields as if they are part of a new materialization. In general I doubt this will be a problem in the short term but it's something to be aware of, and ideally the control plane updates will be available soon.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

We'll need new documentation for this.

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

dyaffe commented 1 week ago

@williamhbaker, just checking on this. The connector seems to be approved. Can we possibly merge it today?

williamhbaker commented 1 week ago

@williamhbaker, just checking on this. The connector seems to be approved. Can we possibly merge it today?

Yes. There are comments to address and I am working on that.