apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
149 stars 34 forks source link

feat: Add support for run-end encoded array #507

Closed cocoa-xu closed 3 weeks ago

cocoa-xu commented 4 weeks ago

Hi this PR tries to add support for run-end encoded array based on the arrow spec here, https://arrow.apache.org/docs/format/Columnar.html#run-end-encoded-layout.

cocoa-xu commented 4 weeks ago

Hi @felipecrv, many thanks for the code review and these examples! I've adapted some code from the code snippets in the review, and it should handle the validation properly now. :)

codecov-commenter commented 4 weeks ago

Codecov Report

Attention: Patch coverage is 69.56522% with 28 lines in your changes missing coverage. Please review.

Project coverage is 88.55%. Comparing base (8894ebf) to head (0ce34e5). Report is 10 commits behind head on main.

Files Patch % Lines
src/nanoarrow/array.c 55.55% 28 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #507 +/- ## ========================================== - Coverage 89.21% 88.55% -0.66% ========================================== Files 90 90 Lines 16294 16541 +247 ========================================== + Hits 14536 14648 +112 - Misses 1758 1893 +135 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

cocoa-xu commented 4 weeks ago

Just added more tests to cover more lines and the memory issue in the schema test should be fixed.

felipecrv commented 3 weeks ago

Just a few nits, but this looks good from my end!

@felipecrv are there any more REE-specific details relevant to this PR (or follow-ups that we need to open issues for and tackle before the next release?)

Not that I'm aware right now. I recommend that you and @cocoa-xu take a look at the C++ code around REEs to see all the traps. I don't remember everything that I did, but I left many comments in the C++ implementation.

https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/ree_util.h and https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/ree_util.cc are good sources to look at.

WillAyd commented 3 weeks ago

I think this PR introduced a regression that valgrind is complaining about. You can see it started back in the weekly Meson build:

https://github.com/apache/arrow-nanoarrow/actions/runs/9432588616

A more detailed error description can be found here:

https://github.com/apache/arrow-nanoarrow/actions/runs/9457996609/job/26052769002?pr=483#step:7:319

Apparently Valgrind does not like something that ultimately hits this codeblock from the RunEndEncoded tests:

https://github.com/apache/arrow-nanoarrow/blob/f443e46163ded43ddb5e024c1a2f1fddd3b43c3f/src/nanoarrow/array.c#L1247