StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.74k stars 1.75k forks source link

[Enhancement] Optimize array_sortby for multi key columns #47798

Closed ZiheLiu closed 3 months ago

ZiheLiu commented 3 months ago

Why I'm doing:

Optimize the performance of array_sortby for multi key columns.

What I'm doing:

For now, each column of each row will call sort_and_tie_column(**range**) once. sort_and_tie_column is pretty heavy need , because it calls the virtual methods ColumnVisitor::accept and Column::visit, and create and destroy several STL containers such as tie and permutation.

Therefore, make each column only call sort_and_tie_column(ranges) once.

Test

t1 contains 20,000,000 rows and 1 bucket. Each array row contains 3 columns.

-- Order By One Column
select max(t1.res) from (select array_min(array_sortby(c1, c2)) as res from t1 ) as t1;
-- Order By Two Columns
select max(t1.res) from (select array_min(array_sortby(c1, c2,c3)) as res from t1 ) as t1;

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

github-actions[bot] commented 3 months ago

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] commented 3 months ago

[BE Incremental Coverage Report]

:white_check_mark: pass : 125 / 128 (97.66%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/exec/sorting/sort_column.cpp 61 64 95.31% [150, 159, 208]
:large_blue_circle: be/src/exec/sorting/sort_permute.cpp 2 2 100.00% []
:large_blue_circle: be/src/exec/sorting/sort_helper.h 3 3 100.00% []
:large_blue_circle: be/src/exprs/array_functions.cpp 59 59 100.00% []
github-actions[bot] commented 3 months ago

@Mergifyio backport branch-3.3

mergify[bot] commented 3 months ago

backport branch-3.3

✅ Backports have been created

* [#47827 [Enhancement] Optimize array_sortby for multi key columns (backport #47798)](https://github.com/StarRocks/starrocks/pull/47827) has been created for branch `branch-3.3`