delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

Adopt the delta kernel types #2489

Closed rtyler closed 3 weeks ago

rtyler commented 1 month ago

What feels like long ago, @roeap imported some kernel types manually into the kernel module from the early work on delta-kernel-rs. Now delta_kernel has been released with a 0.0.1 we can move to adopting those symbols and begin incrementally adopting functionality from the kernel.

This issue is just about the types. I'm not expecting to pull any of the log replay or snapshot code in

ion-elgreco commented 1 month ago

I noticed the types were incorrect in kernel for timestamps. Would be good to backport fixes from delta-rs into kernel, otherwise we will have to fix some bugs again ^^

rtyler commented 1 month ago

@ion-elgreco we do indeed have some divergence :scream:

How confident are you in our tests for timestamp support? :smile:

nicklan commented 1 month ago

I noticed the types were incorrect in kernel for timestamps. Would be good to backport fixes from delta-rs into kernel, otherwise we will have to fix some bugs again ^^

We should definitely just fix those in kernel. @ion-elgreco , do you have links to the fixes by any chance? Or just the divergence so I can figure out what needs to change? I can probably figure it out but might be faster if you just point me at it.

Thanks!

ion-elgreco commented 1 month ago

@nicklan I fixed the decimal parsing to prevent decimals being parsed that are beyond 38 precision or scale, see here (https://github.com/delta-io/delta-rs/pull/2332/files):

https://github.com/delta-io/delta-rs/blob/81593e919497221a1a08bf8db9d20e8e4a39a8a6/crates/core/src/kernel/models/schema.rs#L548-L552

and here: https://github.com/delta-io/delta-rs/blob/81593e919497221a1a08bf8db9d20e8e4a39a8a6/crates/core/src/kernel/models/schema.rs#L628-L635

Here for timestampNtz: https://github.com/delta-io/delta-rs/blob/81593e919497221a1a08bf8db9d20e8e4a39a8a6/crates/core/src/kernel/expressions/eval.rs#L49-L51

and some parsing of multiple formats here: https://github.com/delta-io/delta-rs/pull/2383/files#diff-8e1a60f799cd1ad29923ee75871099a19c15ddfc00bef4617d82bdc02cbe3198

roeap commented 1 month ago

take

roeap commented 1 month ago

started work on this, we do require some updates on kernel first to be able to consume the types, but hoping that this will not be too invasive.