NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

Simplify $ transpiling and fix newline character bug #11703

Closed SurajAralihalli closed 1 week ago

SurajAralihalli commented 2 weeks ago

This PR addresses an issue where lineTerminatorMatcher(excludeCRLF = true) returns an empty Regex AST, which could lead to incorrect behavior when passed unchecked to RegexRepetition.

This update also improves handling of end-of-line characters following a line anchor (e.g., $\r, $\u2028), falling back to CPU due to cuDF's lack of support for negative lookahead. However, checkUnsupported will already catch these cases before reaching this point.

NVnavkumar commented 1 week ago

build

SurajAralihalli commented 1 week ago

build

SurajAralihalli commented 1 week ago

build