apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
671 stars 477 forks source link

ORC-1645: [C++] Evaulate stripe stats before load stripe footer #1835

Closed Smith-Cruise closed 4 months ago

Smith-Cruise commented 4 months ago

What changes were proposed in this pull request?

Stripe's stats are in the orc tail's metadata, which means we can evaluate stripe's stats before loading stripe's footer.

That can save one IO request.

https://issues.apache.org/jira/browse/ORC-1645

Why are the changes needed?

reduce iops

How was this patch tested?

passed ut

Was this patch authored or co-authored using generative AI tooling?

no

wgtmac commented 4 months ago

Thanks for the fix! Could you create a JIRA issue for this?

Smith-Cruise commented 4 months ago

Thanks for the fix! Could you create a JIRA issue for this?

done

ffacs commented 4 months ago

+1 LGTM

dongjoon-hyun commented 4 months ago

Thank you all.

dongjoon-hyun commented 4 months ago

BTW, @Smith-Cruise , I officially added you to the Apache ORC contributor group (in ASF JIRA) and assigned ORC-1645 to you. Welcome to the Apache ORC community again.

$ git log --oneline --author=chendingchao1
6c24acdbf (HEAD -> main, apache/main, apache/HEAD) ORC-1645: [C++] Evaulate stripe stats before load stripe footer
bbb1f074d MINOR: Fix a typo in `ColumnReader.cc`