airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.27k stars 4.15k forks source link

feat: validate airbyte metadata added to iceberg schema #48604

Closed jdpgrailsdev closed 1 day ago

jdpgrailsdev commented 2 days ago

What

How

Review guide

  1. IcebergUtil.kt
  2. IcebergUtilTest.kt
  3. IcebergV2WriterTest.kt
  4. AirbyteTypeToAirbyteTypeWithMetaTest.kt

Can this PR be safely reverted and rolled back?

vercel[bot] commented 2 days ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 22, 2024 6:03pm
edgao commented 2 days ago

this looks a lot like https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/bulk/core/load/src/main/kotlin/io/airbyte/cdk/load/data/AirbyteTypeToAirbyteTypeWithMeta.kt ? which we can maybe reuse in some way

jdpgrailsdev commented 2 days ago

this looks a lot like https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/bulk/core/load/src/main/kotlin/io/airbyte/cdk/load/data/AirbyteTypeToAirbyteTypeWithMeta.kt ? which we can maybe reuse in some way

@edgao It does and I wasn't sure how to reuse that given that it returns Airbyte types and not Iceberg types. I suspect what is really being pointed out here is some refactor to this in the CDK to make the common parts more common.

jdpgrailsdev commented 1 day ago

PR has been updated to validate that the metadata is added to the schema. I have removed the code that I originally added to do this, as the existing logic already adds the fields. This is confirmed via new tests added to this PR.