For regex operations like rlike, the input regex is transpiled to ast, then another regex string that is supported by cuDF, if it can't be transpiled, then the regex operation falls back to cpu. Transpiling is performed in tagExprForGpu.
10715 introduced regex rewrite for rlike, it also needs to parse a regex string to ast in convertToGpu. This operation can be combined with the parsing in transpiling to save time and make the code cleaner.
We can refactor the transpiler code to split it into two steps: regex string to ast and ast to new regex string, and then move the regex rewrite to tagExprForGpu and then save the optimization type in Meta.
Originally posted by @revans2 in https://github.com/NVIDIA/spark-rapids/pull/10715#discussion_r1600186097_
For regex operations like
rlike
, the input regex is transpiled to ast, then another regex string that is supported by cuDF, if it can't be transpiled, then the regex operation falls back to cpu. Transpiling is performed intagExprForGpu
.10715 introduced regex rewrite for
rlike
, it also needs to parse a regex string to ast inconvertToGpu
. This operation can be combined with the parsing in transpiling to save time and make the code cleaner.We can refactor the transpiler code to split it into two steps: regex string to ast and ast to new regex string, and then move the regex rewrite to
tagExprForGpu
and then save the optimization type in Meta.