apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 438 forks source link

[GLUTEN-7866][CH] Fallback for unsupported regex in re2 #7867

Open exmy opened 2 weeks ago

exmy commented 2 weeks ago

What changes were proposed in this pull request?

(Fixes: #7866)

How was this patch tested?

add ut

github-actions[bot] commented 2 weeks ago

https://github.com/apache/incubator-gluten/issues/7866

github-actions[bot] commented 2 weeks ago

Run Gluten Clickhouse CI on x86

github-actions[bot] commented 2 weeks ago

Run Gluten Clickhouse CI on x86

PHILO-HE commented 2 weeks ago

@exmy, I'm not familiar with CH backend code. In velox backend, we have a validation function on native side (see below link). In its implementation, RE2 tries to compile literal regex pattern, and then some known unsupported or incompatible cases are filter out even though compiling can pass. Can this handling be a reference to you to fix your issues?

https://github.com/apache/incubator-gluten/blob/main/cpp/velox/utils/Common.cc#L55

exmy commented 2 weeks ago

@exmy, I'm not familiar with CH backend code. In velox backend, we have a validation function on native side (see below link). In its implementation, RE2 tries to compile literal regex pattern, and then some known unsupported or incompatible cases are filter out even though compiling can pass. Can this handling be a reference to you to fix your issues?

https://github.com/apache/incubator-gluten/blob/main/cpp/velox/utils/Common.cc#L55

Thanks for advice! It's indeed a better way to validate on native side, but ch backend currently does not support it. Maybe this way wil be adopted later as well.

BTW, I noticed it seems missing validate for split expression in https://github.com/apache/incubator-gluten/blob/main/cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc#L56. The second parameter for split is also a regex string.

PHILO-HE commented 2 weeks ago

BTW, I noticed it seems missing validate for split expression in https://github.com/apache/incubator-gluten/blob/main/cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc#L56. The second parameter for split is also a regex string.

@exmy, thanks for letting me know this! Currently, this function's implementation is not fully ready in Velox. I have created an issue to track this: https://github.com/apache/incubator-gluten/issues/7872.