delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

feat(python): handle PyCapsule interface objects in write_deltalake #2534

Open kylebarron opened 1 month ago

kylebarron commented 1 month ago

Description

Adds support for the Arrow PyCapsule interface.

Since pyarrow is already a required dependency, this takes the minimal route of converting pycapsule interface objects into pyarrow objects. This requires pyarrow 15 or higher for the stream conversion (https://github.com/apache/arrow/issues/39217).

This doesn't modify the existing hard-coded support for pyarrow and pandas

Related Issue(s)

Documentation

github-actions[bot] commented 1 month ago

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

ion-elgreco commented 3 weeks ago

@kylebarron can you fix the linting issues? Then we can merge it

Also wondering, how we should typehint this now, since an input can have the c_stream attribute or not

kylebarron commented 3 weeks ago

@kylebarron can you fix the linting issues? Then we can merge it

I'm pretty packed but I can try to find some time soon.

Also wondering, how we should typehint this now, since an input can have the c_stream attribute or not

You can use these type hints: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#protocol-typehints

ion-elgreco commented 3 weeks ago

@kylebarron ah nice, do you mind adding those typehints when you find the time