apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
860 stars 143 forks source link

Add Python wrappers or create interoperability to call the classes/methods using Python #253

Open sagarlakshmipathy opened 10 months ago

sagarlakshmipathy commented 10 months ago

Data engineers quite heavily rely on Python for creating data pipelines. Its important to support the ability to call the Java objects using Python (through Py4J or similar tools/libraries). Right now, users can only use JVM languages out of the box if they were to use OneTable directly i.e. as shown in docker demo.

ion-elgreco commented 9 months ago

Why not also non-jvm and then python bindings to that language.

the-other-tim-brown commented 9 months ago

@ion-elgreco any language you are looking for in particular?

Right now, the implementation relies on java since that is the only common language for interacting with tables in the major table formats. We can also consider a deployment model where this code is behind some lightweight server and build an API that users can call. Generating clients in various languages is pretty straightforward so I'm curious what people think of this idea.

ion-elgreco commented 9 months ago

Rust in particular and then using pyo3 bindings. We already do this at Delta-RS

soumilshah1995 commented 7 months ago

I think what would be great is some pyspark examples which show how to invoke onetable