StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.05k stars 1.82k forks source link

[Enhancement] Kudu scanner should be allowed to do non-share scan. #53123

Open Jcnessss opened 12 hours ago

Jcnessss commented 12 hours ago

Why I'm doing:

For now, all scanners generated by HiveDataSource are not allowed to do non-share scan. It make sense for file scans since the size of each file might diff a lot. But the sizes of the tablets in kudu are basically the same, so we should try to assign morsel uniformly among operators to avoid data skew (showed below).

Kudu scan: image

the following project: image

What I'm doing:

Allow non-share scan in kudu connector.

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

github-actions[bot] commented 12 hours ago

[BE Incremental Coverage Report]

:white_check_mark: pass : 4 / 4 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: src/connector/hive_connector.cpp 4 4 100.00% []
github-actions[bot] commented 5 hours ago

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] commented 5 hours ago

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)