delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[PROTOCOL RFC] Support for collated strings in the schema and statistics #2894

Open olaky opened 7 months ago

olaky commented 7 months ago

Protocol Change Request

Description of the protocol change

Spark is introducing support for collated Strings (see SPARK-46830) and we should support collated columns and fields in Delta tables as well. This will require changes to two parts of the Delta protocol

More details about the idea can be found in the Design Doc

Willingness to contribute

The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?

olaky commented 6 months ago

Protocol RFC PR is open: https://github.com/delta-io/delta/pull/3068

c27kwan commented 1 month ago

Hi, great proposal! I'd like to amend the part about readVersion hints. I don't think this information should be stored at a table level. I also think we need to specify what the reader and writer requirements are more explicitly. See my PR: https://github.com/delta-io/delta/pull/3741