apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
149 stars 34 forks source link

fix: check `run_ends_view->length` before accessing its values #518

Closed cocoa-xu closed 3 weeks ago

cocoa-xu commented 3 weeks ago

This PR should fix the issue where run_ends_view->length is not checked if equals to 0 before attempting to access run_ends_view's values. Many thanks to @WillAyd.

WillAyd commented 3 weeks ago

Wow that was fast - great work @cocoa-xu !

Unfortunately Valgrind still doesn't like something with how this is set up:

https://github.com/apache/arrow-nanoarrow/actions/runs/9468487196/job/26084953378?pr=483#step:7:308

I haven't been able to reproduce locally or boil it down to an MRE. Just sharing in case you have any other ideas what might have gone awry.

Thanks again for the contributions

paleolimbot commented 3 weeks ago

I'll take a look and perhaps update the valgrind flags in our memcheck (which I ran against this PR but it didn't catch the uninitialized values!)

WillAyd commented 3 weeks ago

I think I was able to find some more information when running meson with -Db_sanitize=address,undefined:

../src/nanoarrow/array.c:753:11: runtime error: signed integer overflow: 9223372036854775807 + 7 cannot be represented in type 'long int'
../src/nanoarrow/array.c:892:9: runtime error: signed integer overflow: 9223372036854775807 + 7 cannot be represented in type 'long int'

It looks like the offsets value for the RUN_END_ENCODED parent array_view is a junk value, but ArrowArrayViewValidateMinimal does not expect this

https://github.com/apache/arrow-nanoarrow/blob/e92c364f65f9d6fb029d918b9ff31a8a39b3a1df/src/nanoarrow/array.c#L752

So that is definitely problematic. Haven't quite pieced together the link between that and what valgrind is giving us, but maybe there is one

paleolimbot commented 3 weeks ago

I didn't get to actually fixing this today, but I did get far enough to figure out that commenting out the following lines eliminates the problem:

https://github.com/apache/arrow-nanoarrow/blob/e92c364f65f9d6fb029d918b9ff31a8a39b3a1df/src/nanoarrow/array_test.cc#L1505-L1513

I am guessing that the error will happen whenever the offset is> LONG_MAX but I'm not sure.

I reproduced using:

# docker run --rm -it -v $(pwd):/nanoarrow ghcr.io/apache/arrow-nanoarrow:ubuntu
ci/scripts/build-with-meson.sh

...but had to update the build-with-meson to handle an empty PKG_CONFIG_PATH:

    if [ -z "${PKG_CONFIG_PATH}"]; then
      meson setup "${SANDBOX_DIR}"
    else
      meson setup "${SANDBOX_DIR}" --pkg-config-path $PKG_CONFIG_PATH
    fi
paleolimbot commented 3 weeks ago

I think it is that overflow occurs in two places here when offset + length > INT64_MAX:

https://github.com/apache/arrow-nanoarrow/blob/2fd50f7c3a87eddf951a3904e442475665bed414/src/nanoarrow/array.c#L1265-L1269

(this can also be moved to ArrowArrayViewValidateDefault() since it doesn't require looping over the entire run ends/values buffer)