dmn-tck / tck

Decision Model and Notation Technology Compatibility Kit
https://dmn-tck.github.io/tck
51 stars 36 forks source link

[0067-feel-split-function] The regex usage differences from the DMN specification #676

Open saig0 opened 1 month ago

saig0 commented 1 month ago

Description

The test case 0067-feel-split-function: 001 verifies the usage of the split() function with a regex argument.

The DMN evaluates the following FEEL expression:

split("John Doe", "\s")

The test case is the same as in the DMN 1.5 specification (Chapter 10.3.4.3 String functions, page 142):

Screenshot from 2024-09-25 12-50-44

However, there is a difference in the regex. Instead of "\\s" from the DMN specification, the test case uses "\s" (one backslash vs. two backslash).

We should align the TCK test case with the DMN specification.

As far as I understand the DMN specification, the string literal "\s" is invalid because it is not a listed escape character (from grammar rule 64):

Screenshot from 2024-09-25 12-55-52

falko commented 1 month ago

This looks to me like a spec issue. There is no need for a double backslash since \s is not a recognized escape sequence in FEEL. Double backslash would only be needed for things that are in grammar rule 64, e.g. split("John\nDoe", "\\n"). On the other hand, the double backslash will just be translated into a single backslash by the FEEL parser leaving a \s for the regular expression parser. So not really an issue with the spec and just a slightly misleading example. A proposal for clarification could be to change the example with \s into using a single backslash and adding the above \n example in addition to show when escaping the backslash is needed.

saig0 commented 1 month ago

adding the above \n example in addition to show when escaping the backslash is needed.

:+1: For adding the example in the spec and the TCK.

There is no need for a double backslash since \s is not a recognized escape sequence in FEEL.

Are you sure? I'm a bit confused about how the escape sequences and regex characters should be handled in FEEL.

Let's look at the following examples:

// splits the string literal with a new line at the new line character
split("John\nDoe", "\\n")

// splits the string literal with a whitespace at the whitespace character
split("John Doe", "\s")

In JavaScript, the example looks a bit different. \s is not allowed because it is not an escape character.

"John\nDoe".split(new RegExp("\\n"));
"John Doe".split(new RegExp("\\s"));

// or, with literal notation
"John\nDoe".split(/\n/);
"John Doe".split(/\s/);

It looks similar in Java:

"John\nDoe".split("\\n")

// "\s" produces a compile failure
"John Doe".split("\\s")

cc: @nikku @barmac

barmac commented 1 month ago

I think we should stick to what the specification states as it conforms to both Java and JavaScript standards. I don't see a reason why FEEL should deviate from the de-facto standard of the programming languages. So I support it that we should fix the problem within TCK.

nikku commented 1 month ago

The question to discuss here is if \ is a valid character in FEEL, or not. According to the spec \\ is the escaped form of \, yielding a \ character; this may or may not imply that \ itself is illegal, as it is part of an "escape sequence".

Cf. JavaScript example, again, a raw \ does not exist, outside of valid escape sequences:

image