apache / hudi-rs

A native Rust library for Apache Hudi, with bindings into Python
https://hudi.apache.org/
Apache License 2.0
142 stars 28 forks source link

Create `HudiTableFactory` implementing DataFusions `TableProviderFactory` #150

Closed matthewmturner closed 5 days ago

matthewmturner commented 3 weeks ago

Is there an existing issue for this?

Description of the bug

This isnt a bug report - its a feature request but i didnt see a way to submit a feature request.

I would like to be able to register hudi tables with datafusion like so:

CREATE EXTERNAL TABLE my_table STORED AS HUDITABLE LOCATION '/path/to/table';

If hudi-rs provided a TableProviderFactory then we could register and use that (This is how i currently register deltalake tables and I would like to do something similar for hudi).

Steps To Reproduce

Not a bug

Expected behavior

I can register tables to datafusion like so:

CREATE EXTERNAL TABLE my_table STORED AS HUDITABLE LOCATION '/path/to/table';

Screenshots / Logs

No response

Software information

N/A

Additional context

No response

xushiyan commented 1 week ago

@matthewmturner thanks for raising this! would you be able to review the linked PR by @kazdy please?

matthewmturner commented 1 week ago

@xushiyan sure, checking it out

matthewmturner commented 1 week ago

Looks good but when i try integrating in my app i get the below error (which isnt surprising since we arent on the same version of datafusion). Of course thats not a blocker for this though :). Excited to see this

image
xushiyan commented 5 days ago

@matthewmturner we just landed this feature in main. can you pls test it out with your PR? note that we choose HUDI as the name to stay consistent. See https://github.com/apache/hudi-rs/pull/162#discussion_r1799914280. thanks.

matthewmturner commented 5 days ago

@xushiyan sure will do, do you have any recommendations for what test data to use? i just need something simple to test a basic query

xushiyan commented 5 days ago

@matthewmturner yes we keep a bunch of zipped test tables in https://github.com/apache/hudi-rs/tree/main/crates/tests/data/tables

you can unzip and use them directly.

matthewmturner commented 5 days ago

Okay, going to work on this tonight / tomorrow. I may ping you for a review if you dont mind.

matthewmturner commented 5 days ago

I must be doing something wrong, im getting error that hoodie.properties doesnt exist even though it does.

image

will pick this up tomorrow.

kazdy commented 5 days ago

Hi Matt,

I cloned your branch and got this:

image

Can it be something related to your file permissions maybe? edit: permissions look ok and same as mine

matthewmturner commented 5 days ago

sigh i got it to work. the repo used to be called datafusion-tui and i still have it as that locally pointing to datafusion-dft so my path was wrong.

matthewmturner commented 5 days ago

and now test passes :)