Closed guiyanakuang closed 10 months ago
@neopaf Can you test if this pr works for your data?
cc @dongjoon-hyun @wgtmac
Do you still have concerns, @wgtmac ?
Let me merge this to be considered as a part of Apache ORC 1.9.x and 2.0.0. We can revert this if there is any issues during the release cycles~
What changes were proposed in this pull request?
This PR aims to fix an issue where the column statistics were incorrectly evaluated in scenarios where no values were written, resulting in the inability to skip row groups.
Why are the changes needed?
The fix improves the evaluation logic of statistics, enabling the skipping of row groups that don't need to be read, thus enhancing performance.
How was this patch tested?
Unit tests have been added to validate the changes.