[CH] AQE seems not working on skewed partitions

zhanglistar commented 2 weeks ago

Description

SQL: d_1075_0.sql The data itself is skewed on column type about 65% values on one type. Vanila works fine, 45s max processing time. But gluten with CH backend seems not working about 1.76 hour the biggest task. vanila:

gluten:

lgbo-ustc commented 2 weeks ago

There are two adaptive rules to handle partitions skew, OptimizeSkewedJoin and OptimizeSkewInRebalancePartitions.

OptimizeSkewedJoin only works on SortMergeJoinExec and ShuffledHashJoinExec,(Why does broadcast hash join not work?). After disable broadcast join, the partition skew disappeare. Partitions are more balance.

OptimizeSkewedJoin cannot be applied in this case, we found the join algorithm is broadcast hash join. Could it be shuffle hash join in vanilla ? I don't think so, the final plan says it's broadcast hash join too.

OptimizeSkewInRebalancePartitions doesn't work, because the shuffle.shuffleOrigin is ENSURE_REQUIREMENTS. It only supports ENSURE_REQUIREMENTS and ENSURE_REQUIREMENTS in OptimizeSkewInRebalancePartitions.

24/09/05 13:09:35.153 ERROR [main] DebugOptimizeSkewInRebalancePartitions: xxx ColumnarExchange hashpartitioning(type#51L, 1000), ENSURE_REQUIREMENTS, [plan_id=109], [shuffle_writer_type=hash], [OUTPUT] List(appid:LongType, type:LongType, info:MapType(StringType,StringType,true)), [OUTPUT] List(appid:LongType, type:LongType, info:MapType(StringType,StringType,true))

zhanglistar commented 2 weeks ago

It seems that the AQE not right, so AQE shrinks the task count from 900+ to 20+.

wForget commented 2 weeks ago

OptimizeSkewedJoin only works on SortMergeJoinExec and ShuffledHashJoinExec,(Why does broadcast hash join not work?).

The OptimizeShuffleWithLocalRead rule in AQE is used to optimize BroadcastHashJoin to avoid skew.

lgbo-ustc commented 2 weeks ago

Thanks @wForget , I think the skew is caused by disabling this rule. Because we use rss which doesn't support local read.

wForget commented 2 weeks ago

Thanks @wForget , I think the skew is caused by disabling this rule. Because we use rss which doesn't support local read.

This problem does exist in the RSS scenario. I tried to fix it in Spark https://github.com/apache/spark/pull/41609, but unfortunately it was not merged into upstream

lgbo-ustc commented 2 weeks ago

Thanks @wForget , I think the skew is caused by disabling this rule. Because we use rss which doesn't support local read.

This problem does exist in the RSS scenario. I tried to fix it in Spark apache/spark#41609, but unfortunately it was not merged into upstream

It's strange, when we enable spark.sql.adaptive.localShuffleReader.enabled, the query runs OK and fast, and the skew disappeares.

lgbo-ustc commented 2 weeks ago

@wForget In what scenario, rss would bring performance drawback? I wonder whether we could use local shuffle reader in some specific scenarios

wForget commented 2 weeks ago

When using RSS, localShuffleReader should not be enabled in most cases, as this may cause random reads again.

zhanglistar commented 2 weeks ago

For now, we just disable broadcast join to not triger this problem, this is not gluten problem.

apache / incubator-gluten

[CH] AQE seems not working on skewed partitions #7114

Description