apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
671 stars 477 forks source link

What's the meaning of EvaluatedRowGroupCount in ReaderMetrics #1829

Closed Smith-Cruise closed 4 months ago

Smith-Cruise commented 4 months ago

in SargsApplier::pickRowGroups(), EvaluatedRowGroupCount looks like is row group that has been evaluated

but in SargsApplier::evaluateFileStatistics() and SargsApplier::evaluateStripeStatistics(), EvaluatedRowGroupCount looks like is row group has been filtered by Sargs.

I'm little confused, anyone knows about it?

wgtmac commented 4 months ago

Could you help answer this? @coderex2522

ffacs commented 4 months ago

It seems that there are some bugs in the code that SargsApplier::evaluateFileStatistics() and SargsApplier::evaluateStripeStatistics() didn't count up row groups that has been evaluated correctly.

Smith-Cruise commented 4 months ago

It seems that there are some bugs in the code that SargsApplier::evaluateFileStatistics() and SargsApplier::evaluateStripeStatistics() didn't count up row groups that has been evaluated correctly.

I don't know if it's a bug or if it's intentional

wgtmac commented 4 months ago

It should be a bug. Do you want to fix it? @ffacs @Smith-Cruise

Smith-Cruise commented 4 months ago

It should be a bug. Do you want to fix it? @ffacs @Smith-Cruise

It's easy to fix, so we need to count row groups even they can be filtered.