apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
592 stars 134 forks source link

Rust <> Python integration point #538

Open kevinjqliu opened 1 month ago

kevinjqliu commented 1 month ago

After establishing #518, I want to start the conversation to create the first integration between PyIceberg and iceberg-rust. As discussed in the dev list, we want to create an integration based on pluggable FileIO.

I'm wondering if there's also a way to create an integration for a pluggable catalog, based on the in-memory catalog implementation in #475.

I'm not familiar with the rust ecosystem, so would appreciate any pointers

Xuanwo commented 1 month ago

I'm wondering if there's also a way to create an integration for a pluggable catalog, based on the in-memory catalog implementation in #475.

I believe this should also be possible. So, the pyiceberg community wants to have an in-memory catalog based on iceberg-rust. Does pyiceberg provide an interface that we can integrate with?

The in-memory catalog depends on FileIO, so we might need to build FileIO first. However, it also makes sense to expose a purely in-memory catalog (memory FileIO and memory catalog) to pyiceberg initially.

liurenjie1024 commented 1 month ago

I think it's definitely possible since PyIceberg is Catalog interface is extensible. I think you need to start with pyo3 first to understand how it works.

kevinjqliu commented 1 month ago

Does pyiceberg provide an interface that we can integrate with?

Yes, there is a py-catalog-impl configuration that will try to load a given classpath. (documentation, implementation, test)

The in-memory catalog depends on FileIO, so we might need to build FileIO first. However, it also makes sense to expose a purely in-memory catalog (memory FileIO and memory catalog) to pyiceberg initially.

I'm bringing up this issue because I want the simplest way to integrate iceberg-python and iceberg-rust. If FileIO integration is a prerequisite, we can start there instead.

Xuanwo commented 4 weeks ago

Hi, @kevinjqliu, I'm sorry for blocking your innovation this way.

I've been a bit busy recently, but I plan to create something that really works next week. For instance, reading data from PyIceberg using pyiceberg-core. This will enable our community to build more cool things based on that.

kevinjqliu commented 4 weeks ago

@Xuanwo very cool! looking forward to it.

kevinjqliu commented 1 week ago

Looks like @sungwy already started by exposing Transforms in #556

I'll take a stab at exposing the Catalogs, see https://github.com/apache/iceberg-rust/pull/534#issuecomment-2330489500