[FEA] split support "Base expression cannot start with quantifier near index 1"

@viadea the error message is coming from

https://github.com/NVIDIA/spark-rapids/blob/502f5a3cd96e458c8471794af9d2e209d9f0b42f/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RegexParser.scala#L160-L162

Which is essentially saying that an expression cannot start with a *, +, or ? character. This appears to be totally valid, except when it is at the start of a group. Our group parsing code appears to only support non-capture groups.

https://github.com/NVIDIA/spark-rapids/blob/502f5a3cd96e458c8471794af9d2e209d9f0b42f/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RegexParser.scala#L170-L173

Which is only what CUDF appears to also support https://docs.rapids.ai/api/cudf/stable/libcudf_docs/md_regex/#groups

But java patterns https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html appear to support many other types of capture groups, which result in this error.

Named capture groups spark.range(10).selectExpr("split(id, '(?<foo>1)')").show()
Flags in the capture group spark.range(10).selectExpr("split(id, '(?i:1)')").show()
zero width positive look ahead spark.range(10).selectExpr("split(id, '(?=1)')").show()
zero width negative look ahead spark.range(10).selectExpr("split(id, '(?!1)')").show()
zero width positive look behind spark.range(10).selectExpr("split(id, '(?<=1)')").show()
zero width negative look behind spark.range(10).selectExpr("split(id, '(?<!1)')").show()
an independent capture group spark.range(10).selectExpr("split(id, '(?>1)')").show()

As each of these are rather complex to test/implement is there any way that you could clarify which of these is needed?

NVIDIA / spark-rapids

[FEA] split support "Base expression cannot start with quantifier near index 1" #11460