apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 437 forks source link

[GLUTEN-7900][VL] Enable prefix sort in spill #7904

Open jinchengchenghh opened 1 week ago

jinchengchenghh commented 1 week ago

Prefix sort can reduce the spill sort time by 3x with the sampled Meta production query but timsort increase the sort time by 20%. So timsort performance seems depends on the actual data pattern. After pick this PR https://github.com/facebookincubator/velox/pull/11527, a query in jenkins spill with string is 41s prefixsort vs 37s timsort vs 63s stdsort. Relevant Velox PR: https://github.com/facebookincubator/velox/pull/11384 Resolves https://github.com/apache/incubator-gluten/issues/7900

github-actions[bot] commented 1 week ago

https://github.com/apache/incubator-gluten/issues/7900

zhztheplayer commented 1 week ago

Would you like to refer the related Velox PRs for this feature in PR description? Which will help users to track. Thanks.

I'm also thinking about setting up a more comprehensive integration benchmark for spill performance.

The following is by the existing GHA oom tests:

Before:

image

After:

image

jinchengchenghh commented 1 week ago

We have a spill performance in internal Jenkins, I have trigger it. It will run after machine is ready since serves down today.

FelixYBW commented 1 week ago

Let's put the jenkins performance here.