delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

fix: cast support fields nested in lists and maps #2541

Closed HawaiianSpork closed 3 weeks ago

HawaiianSpork commented 1 month ago

Description

The current implementation of cast only works for structs nested in structs. This PR adds supports for structs contained in other types (lists and maps). This PR also prevents cast from adding nullable column if the field is not nullable, instead it will throw an error.

Note: This is only a partial solution which would let you merge schema with nested missing columns, it does not allow delta-rs to read the merged schema (though Spark can). To read the merged schema will require another change where delta-rs defines its own datafusion parquet schemaAdapter.

github-actions[bot] commented 1 month ago

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

ion-elgreco commented 1 month ago

Please add also a bunch of python tests

rtyler commented 1 month ago

Please add also a bunch of python tests

@ion-elgreco I'm confused as to what a Python set of tests would do that the tests in Rust don't already do? :confused:

ion-elgreco commented 1 month ago

Please add also a bunch of python tests

@ion-elgreco I'm confused as to what a Python set of tests would do that the tests in Rust don't already do? :confused:

We mostly need to check the reader side with pyarrow datasets

ion-elgreco commented 3 weeks ago

@HawaiianSpork thanks for the PR! can you rebase please so we can merge

HawaiianSpork commented 3 weeks ago

@ion-elgreco thank you. Sorry, I did not get around to writing python tests. The code has been rebased.