Open zhuliquan opened 5 months ago
Hi @zhuliquan -- I believe you are correct that the regexp is re-compiled for each batch. This is mentioned as one of the use cases for https://github.com/apache/datafusion/issues/8051
I think it would be a nice improvement to look into pre-compiling the regexp somehow, though last time I checked the only benchmark we have (one of the clickbench queries) compiling the regexp was not a significant consumer of time
Hi @zhuliquan -- I believe you are correct that the regexp is re-compiled for each batch. This is mentioned as one of the use cases for #8051
I think it would be a nice improvement to look into pre-compiling the regexp somehow, though last time I checked the only benchmark we have (one of the clickbench queries) compiling the regexp was not a significant consumer of time
I think should pre-compiled when create binary-physical-expr, instead of evaluating record_batch. https://github.com/apache/datafusion/blob/57280e42dc2391ab65c24c0fb52032942d3d85a8/datafusion/physical-expr/src/expressions/binary.rs#L60-L66
I think should pre-compiled when create binary-physical-expr, instead of evaluating record_batch.
I think this makes sense to me. Thank you
One thing to look into might be to replace the regexp match operator with a function (and have the function's implementation store the precompiled regexp)
One tricky part might be serialization (in datafusion-proto). We can probably handle it via an extension codec or something.
Is your feature request related to a problem or challenge?
I noticed below code: https://github.com/apache/datafusion/blob/6a4a280e3cf70fe5f1a1cfe7c2de13e4c39f89bb/datafusion/physical-expr/src/expressions/binary.rs#L537-L564 This looks like every time record_batch is evaluated, it will execute the compiled pattern string and use the compiled results to match arrow-array
Describe the solution you'd like
when building binary physical expr , we can pre-compile pattern string if op is regex_match
Describe alternatives you've considered
No response
Additional context
No response