Closed schlichtanders closed 4 months ago
Not yet, but it's planned for version 1.0.0. https://hudi.apache.org/roadmap/
Currently, one can use Hudi with Python (pyspark), Java and Scala.
Thank you for the pointer to the roadmap. Some C/Rust implementation would be nice for the entire LLVM ecosystem. I myself am looking forward to use Julia together with Hudi some day in the future. (Julia also compiles via LLVM, so a C binding would be optimal).
As the 1.0.0 may still be far in the future, is the java API also accessible outside from Apache Spark? I mean as a pure java library, which could be loaded by some other languages?
Hi @schlichtanders Hudi has the pure Java API for writing tables through HoodieJavaWriteClient. You can check the examples in HoodieJavaWriteClientExample.
I'll close this issue for now. Feel free to reopen the issue if you have more questions.
@yihua is there also a ReadClient? An example would also be great.
@yihua
Thank you @cheunhong. I agree and it is a pity. Hudi's support for streaming is super attractive for me. Neither delta-rs nor iceberg have it as far as I knew...
Thank you @cheunhong. I agree and it is a pity. Hudi's support for streaming is super attractive for me. Neither delta-rs nor iceberg have it as far as I knew...
@schlichtanders @cheunhong I missed this discussion. We are considering different language support. If you have a use case I’d love to chat with you about that and see how the use case can be better supported.
We have an experimental PR on read support in Python: #8768 . We have also introduced a Hudi file group reader to make read integration in engines easier.
For me Python is actually not the problem - via Spark and Flink it is pretty well supported.
My use case is to use the modern programming language Julia directly, without the JVM inbetween, because the language itself is high performant and has distributed computing support. A perfect match for working with Hudi both as big data as well as streaming. Hence it would be great if Hudi is accessible also without Spark and Flink, i.e. without JVM.
I know I was looking into a Rust implementation due to the work that's happening on pg_analytics by ParadeDB, where they purely had to choose delta-rs due to being dependent on Rust tooling to create the Postgres extension. The use case in this instance is that theoretically, if you integrate Hudi (or like they are doing, Delta Lake) as a Postgres extension you can very easily offload data directly on to your data lake to transition to a lakehouse architecture much more easily and avoid having to use external ETL tooling.
A lot of the OSS work being done by Materialize.com , Neon,tech , DataBend is all happening in Rust so theoretically if Hudi could integrate with modern development happening in Rust it could be a big win for the ecosystem I imagine.
@xushiyan do you want to share the budding hudi-rs and python bindings here, to see if anyone wants to chip in for contributions
https://github.com/xushiyan/hudi-rs has some basic reads with datafusion?
@vinothchandar yes. gonna take care of repo logistics and dev setup to make the repo ready for new contributors. Also preparing issues to work on.
@rubenatterbury @schlichtanders @cheunhong we have officially released hudi-rs 0.1.0 ! https://github.com/apache/hudi-rs
Hi,
I am curious about the state of hudi. We are currently using it via Spark, however thinking about switching to another language.
Is it possible to write Hudi files via C, C++, Rust, or anything? Or is it completely tied to Spark/Flink?
Thank you very much for your help