delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
6.96k stars 1.58k forks source link

[WIP][Spark] Allow type widening for all supported type changes #3024

Open johanl-db opened 2 weeks ago

johanl-db commented 2 weeks ago

The type changes added in this PR only work with Spark 4.0 / master which contains the required changes to Parquet readers to be able to read the data after applying the type changes.

Description

Extend the list of supported type changes for type widening to include changes that can be supported with Spark 4.0:

How was this patch tested?

Adding test cases for the new type changes in the existing type widening test suites

Does this PR introduce any user-facing changes?

Yes: allow using the listed type changes with type widening, either via ALTER TABLE CHANGE COLUMN TYPE or during schema evolution in MERGE and INSERT.

KamilKandzia commented 2 weeks ago

Will be in future an option to change the column type of a table from int to string without overwriting the entire table? Unless such an option is now available (but I don't remember that)

johanl-db commented 2 weeks ago

Will be in future an option to change the column type of a table from int to string without overwriting the entire table? Unless such an option is now available (but I don't remember that)

There's no plan currently to support other type changes than the ones mentioned in the PR description.

Converting values when reading from a table that had one of these widening type changes applied can be easily done directly in the Parquet reader, but other type changes are harder either because: