apache / hudi-rs

A native Rust library for Apache Hudi, with bindings into Python
https://hudi.apache.org/
Apache License 2.0
142 stars 28 forks source link

feat: support partition prune api #119

Closed KnightChess closed 2 weeks ago

KnightChess commented 1 month ago

Description

Add filtering capabilities to table API, currently only partition fields are applicable. Multiple predicates are AND together.

hudi_table.read_snapshot(&["foo != a", "bar >= 100"]);

Supported operators are: >, >=, <, <=, =, !=.

For #47

How are the changes test-covered

KnightChess commented 1 month ago

@xushiyan cc

KnightChess commented 1 month ago

@xushiyan hello, Is there a problem with my test method in my local?

image image
KnightChess commented 1 month ago

hello, Is there a problem with my test method in my local?

ignore checkstyle, fetch the latest commit, it work again, but can not report python test error.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 90.07092% with 14 lines in your changes missing coverage. Please review.

Project coverage is 89.32%. Comparing base (e23e6ed) to head (f1ce54d). Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/table/partition.rs 88.76% 10 Missing :warning:
crates/core/src/table/mod.rs 88.00% 3 Missing :warning:
crates/core/src/table/fs_view.rs 96.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #119 +/- ## ========================================== + Coverage 87.82% 89.32% +1.50% ========================================== Files 14 15 +1 Lines 731 834 +103 ========================================== + Hits 642 745 +103 Misses 89 89 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

KnightChess commented 1 month ago

@xushiyan cc:

xushiyan commented 1 month ago

@KnightChess awesome contribution! let me take a look. i might push some quick fixes just FYI to move faster.

xushiyan commented 1 month ago

@KnightChess do you think you can address the main comment in the next few days? then i can polish further if needed and land this. Trying to get this in the upcoming release within 2 weeks 🙂 (cutting RC branch with a week)

KnightChess commented 1 month ago

@xushiyan sorry for reply late, I will address these two days

KnightChess commented 1 month ago

@xushiyan Hello, I couldn't find an implementation similar to ScalarValue, and I am not very familiar with Arrow yet, and. There is a certain learning curve involved, which might delay the progress of this PR. Could you please help improve this PR?

KnightChess commented 1 month ago

@xushiyan cc, I try to use arrow Scalar<ArrayRef> to replace datafusion ScalarValue, and modified some suggestions to repair.

xushiyan commented 1 month ago

@xushiyan cc, I try to use arrow Scalar<ArrayRef> to replace datafusion ScalarValue, and modified some suggestions to repair.

@KnightChess Thanks. I was traveling. Will take a look later today.

xushiyan commented 2 weeks ago

I was wrapping up my vacation 😄 just now getting back to update this:

Changes I've made:

There are more follow up work to do on datafusion integration side, which I'll jot down in the GH issue.

KnightChess commented 1 week ago

@xushiyan thanks review