apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
675 stars 480 forks source link

In cpp/java sdk, SearchArgument looks like didn't use the footer and stripe stats. #1798

Closed Smith-Cruise closed 6 months ago

Smith-Cruise commented 6 months ago

I've checked about the code, but it looks like we only use row group's index in SearchArgument, do I missing something?

Why do we only use row group's index?

wgtmac commented 6 months ago

C++ reader has leveraged the stripe stats. Please check https://github.com/apache/orc/blob/main/c++/src/Reader.cc#L1062-L1066 for reference.