apache / inlong

Apache InLong - a one-stop, full-scenario integration framework for massive data
https://inlong.apache.org/
Apache License 2.0
1.39k stars 528 forks source link

[Feature][SDK] Transform support REGEXP_...() related functions #11060

Closed emptyOVO closed 1 month ago

emptyOVO commented 1 month ago

Description

  1. REGEXP_REPLACE(string1, string2, string3)--Returns a string from STRING1 with all the substrings that match a regular expression STRING2 consecutively being replaced with STRING3.

  2. REGEXP_COUNT(str, regex)--Returns the number of times str matches the regex pattern. regex must be a Java regular expression. str <CHAR | VARCHAR>, regex <CHAR | VARCHAR> Returns an INTEGER representation of the number of matches. NULL if any of the arguments are NULL or regex is invalid.

  3. REGEXP_EXTRACT(string1, string2[, integer])--Returns a string from string1 which extracted with a specified regular expression string2 and a regex match group index integer. The regex match group index starts from 1 and 0 means matching the whole regex. In addition, the regex match group index should not exceed the number of the defined groups.

  4. REGEXP_EXTRACT_ALL(str, regex[, extractIndex])--Extracts all the substrings in str that match the regex expression and correspond to the regex group extractIndex. regex may contain multiple groups. extractIndex indicates which regex group to extract and starts from 1, also the default value if not specified. 0 means matching the entire regular expression. str <CHAR | VARCHAR>, regex <CHAR | VARCHAR>, extractIndex <TINYINT | SMALLINT | INTEGER | BIGINT> Returns an ARRAY representation of all the matched substrings. NULL if any of the arguments are NULL or invalid.

  5. REGEXP_INSTR(str, regex)--Returns the position of the first substring in str that matches regex. Result indexes begin at 1, 0 if there is no match. str <CHAR | VARCHAR>, regex <CHAR | VARCHAR> Returns an INTEGER representation of the first matched substring index. NULL if any of the arguments are NULL or regex is invalid.

  6. REGEXP_SUBSTR(str, regex)--Returns the first substring in str that matches regex. str <CHAR | VARCHAR>, regex <CHAR | VARCHAR> Returns an STRING representation of the first matched substring. NULL if any of the arguments are NULL or regex if invalid or pattern is not found.

Use case

No response

Are you willing to submit PR?

Code of Conduct

emptyOVO commented 1 month ago

There is also a transform function pr that supports regexps before this issue #10986 so this issue will strengthen relevant functions