buoyant-data / oxbow

Collection of AWS Lambdas for creating and managing Delta tables
https://www.buoyantdata.com
GNU Affero General Public License v3.0
20 stars 6 forks source link

Prevent duplicate columns in the schema when partitions are present in parquet files #12

Closed rtyler closed 9 months ago

rtyler commented 9 months ago

In some scenarios Big Query can inline a partition column in output parquet files and some deduplication needs to happen on columns before the initial commit on the table gets created