apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.13k stars 409 forks source link

[VL] Optimize sort based shuffle #5929

Open XinShuoWang opened 2 months ago

XinShuoWang commented 2 months ago

Description

I tested the sort based shuffle function and found through the perf tool that the largest CPU consumption is in the serialize part. Do you have any plans to write a custom serializer instead of using PrestoVectorSerde directly to optimize performance? Or are there any other optimization possibilities?

截屏2024-05-30 22 35 02
FelixYBW commented 2 months ago

Thank you for reporting, @XinShuoWang . we noted the issue and trying to fix. We plan to rewrite the logic.

guhaiyan0221 commented 2 months ago

Thank you for reporting, @XinShuoWang . we noted the issue and trying to fix. We plan to rewrite the logic.

any design doc?

FelixYBW commented 2 months ago

any design doc?

Will write one, stay tune