Closed cocoa-xu closed 3 weeks ago
Wow that was fast - great work @cocoa-xu !
Unfortunately Valgrind still doesn't like something with how this is set up:
https://github.com/apache/arrow-nanoarrow/actions/runs/9468487196/job/26084953378?pr=483#step:7:308
I haven't been able to reproduce locally or boil it down to an MRE. Just sharing in case you have any other ideas what might have gone awry.
Thanks again for the contributions
I'll take a look and perhaps update the valgrind flags in our memcheck (which I ran against this PR but it didn't catch the uninitialized values!)
I think I was able to find some more information when running meson with -Db_sanitize=address,undefined
:
../src/nanoarrow/array.c:753:11: runtime error: signed integer overflow: 9223372036854775807 + 7 cannot be represented in type 'long int'
../src/nanoarrow/array.c:892:9: runtime error: signed integer overflow: 9223372036854775807 + 7 cannot be represented in type 'long int'
It looks like the offsets value for the RUN_END_ENCODED parent array_view is a junk value, but ArrowArrayViewValidateMinimal
does not expect this
So that is definitely problematic. Haven't quite pieced together the link between that and what valgrind is giving us, but maybe there is one
I didn't get to actually fixing this today, but I did get far enough to figure out that commenting out the following lines eliminates the problem:
I am guessing that the error will happen whenever the offset is> LONG_MAX
but I'm not sure.
I reproduced using:
# docker run --rm -it -v $(pwd):/nanoarrow ghcr.io/apache/arrow-nanoarrow:ubuntu
ci/scripts/build-with-meson.sh
...but had to update the build-with-meson to handle an empty PKG_CONFIG_PATH:
if [ -z "${PKG_CONFIG_PATH}"]; then
meson setup "${SANDBOX_DIR}"
else
meson setup "${SANDBOX_DIR}" --pkg-config-path $PKG_CONFIG_PATH
fi
I think it is that overflow occurs in two places here when offset + length
> INT64_MAX
:
(this can also be moved to ArrowArrayViewValidateDefault()
since it doesn't require looping over the entire run ends/values buffer)
This PR should fix the issue where
run_ends_view->length
is not checked if equals to 0 before attempting to accessrun_ends_view
's values. Many thanks to @WillAyd.