apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.42k stars 951 forks source link

[Feature] Improve Paimon Spec #3704

Open Xuanwo opened 4 months ago

Xuanwo commented 4 months ago

Search before asking

Motivation

I'm working on paimon-rust now. I found paimon's specifications to be neither accurate nor detailed.

Take Schema as an example, the current spec only have:

The version of the schema file starts from 0 and currently retains all versions of the schema. There may be old files that rely on the old schema version, so its deletion should be done with caution.

Schema File is JSON, it includes:

fields: data field list, data field contains id, name, type, field id is used to support schema evolution. partitionKeys: partition definition of the table, it cannot be modified. primaryKeys: primary key definition of the table, it cannot be modified. options: options of the table, including a lot of capabilities and optimizations.

I will have the following questions:

Solution

Please provide more detailed specifications for Paimon, possibly including pseudocode examples.

Anything else?

None

Are you willing to submit a PR?

JingsongLi commented 4 months ago

I will do this.