NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
749 stars 221 forks source link

Rewrite `pattern[A-B]{X,Y}` (a pattern string followed by X to Y chars in range A - B) in `RLIKE` to a custom kernel #10821

Closed thirtiseven closed 3 weeks ago

thirtiseven commented 1 month ago

To speedup rlike, we can rewrite some pattern to custom kernel. pattern[A-B]{x,y} is a common pattern we observed in customer use cases. We can rewrite them into a general custom kernel to match when a string contains a pattern like pattern[A-B]{x,y}.

It needs a JNI kernel and related plugin change to match this pattern in regexParser.