Closed al13n321 closed 1 month ago
Thanks for the fix! Could you please use the original PR template?
cc @ffacs
Could you please use the original PR template?
Edited.
Thank you for making the PR @al13n321. It seems there are no tests to cover the case. Could you please add a little unit test?
Added a minimal test. It fails without the fix as expected:
/home/ec2-user/ClickHouse/contrib/orc/c++/test/TestWriter.cc:458: Failure
Expected equality of these values:
int_stats->getMinimum()
Which is: 0
-128
/home/ec2-user/ClickHouse/contrib/orc/c++/test/TestWriter.cc:459: Failure
Expected equality of these values:
int_stats->getMaximum()
Which is: 255
127
/home/ec2-user/ClickHouse/contrib/orc/c++/test/TestWriter.cc:461: Failure
Expected equality of these values:
int_stats->getSum()
Which is: 8355585
sum
Which is: -32767
Would be good to properly test statistics, with all data types, row index statistics, random values, random stripe and row index sizes, etc, but that's outside the scope of this PR.
What changes were proposed in this pull request?
Calculate min/max statistics for columns of type BYTE using
signed char
instead ofchar
type (C++).Why are the changes needed?
char
can be signed or unsigned, depending on target platform and compiler flags, but the statistics in the ORC file should always treat numbers as signed. Specifically, it was behaving incorrectly on ARM becausechar
is unsigned there.How was this patch tested?
Tested as part of ClickHouse here: https://github.com/ClickHouse/ClickHouse/pull/64563 . ~Let me know if I should add a unit test or something (though presumably there are already unit tests for this, they just don't run on ARM in CI?)~ Added a small test in TestWriter.cc reproducing the problem.
Was this patch authored or co-authored using generative AI tooling?
No