apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.45k stars 961 forks source link

[parquet] Add type id to parquet files #4362

Closed tsreaper closed 1 month ago

tsreaper commented 1 month ago

Purpose

Currently parquet files produced by Paimon have no type id, however other compute engine (for example, Trino) or lake format (for example, Iceberg) rely on type id to project columns.

This PR adds type id to parquet files.

Tests

Unit tests.

API and Format

Yes. Special field ids are assigned to array elements and map keys / values. Also type ids are added to parquet files.

Documentation

No new feature.