delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.21k stars 395 forks source link

feat(python, rust): arrow large/view types passthrough, rust default engine #2738

Closed ion-elgreco closed 1 month ago

ion-elgreco commented 1 month ago

Description

Allows large/view types to be passed through during write, and prevents unnecessary potentially costly casting that could fail.

In Pyarrow engine only normal/large modes can be used, in Rust engine we always passthrough since we allow passthrough throughout the codebase now, this notion can be removed once pyarrow engine is fully deprecated.

Can be merged after: https://github.com/delta-io/delta-rs/pull/2727

Related issues

ion-elgreco commented 1 month ago

@rtyler still a couple test failures, so will take another round tomorrow on this!

ion-elgreco commented 1 month ago

@aersam hey! Could you perhaps take a look on the refactored casting logic :) I reintroduce some old code since we now have to allow large/view types passthrough

ion-elgreco commented 1 month ago

Looking at this I feel our conversion code is getting a bit excessive and we have to eventually step back and see if we can find a cleaner solution to all of this - I do not however have a better idea ready :).

My main question would be what now our minimal python / pyarrow support would look like as we are not testing this anymore? Should there not be some test for this?

Yup.. same goes for the writer. Also now with this passthrough change we allow more flexibility between utf8/binary/list flavours, but we are now less flexible on writing as a side effect. So a batch that is int64 to a table with int32 for example. Before this "might" have worked, however now it won't because we can't merge those schema's

Not sure what to do here, we can add this functionality in the schema merge to simply check if something is a supertype but it doesn't sit right

ion-elgreco commented 1 month ago

Hmm I'll put this back in draft and think of a redesign.