apache / incubator-kie-issues

Apache License 2.0
12 stars 1 forks source link

`matches()` function wrongly behaves #1448

Closed gitgabrio closed 1 month ago

gitgabrio commented 2 months ago

According to the DMN specs, the matches() and replace() functions should behave according to the xQuery 1.0 specification. Our current implementation doesn't rely on that, using the native Java implementation to manage both. Unfortunately, the Java regex management differs from the xQuery specs, and natively implementing the xQuery specs leads to huge effort (and complex) work. For that reason, we agreed to rely on an external library that already implemented xQuery specs in Java. After a quick analysis (https://en.wikipedia.org/wiki/XQuery#Implementations), we choose Saxon-He as the best fit for this purpose.

The scope of the ticket is to integrate this new external dependency into our code base, so both matches() and replace() can rely on that implementation to correctly behave.

=== Original description ====

TCK Tests revealed some cases where matches() wrongly behaves.

  1. matches("O", "[A-Z-[OI]]", "i") should return false. Before fixing that, double-check in the specs that this is the expected behavior
  2. matches("i", "[A-Z-[OI]]", "i") should return false. Before fixing that, double-check in the specs that this is the expected behavior

This syntax is invalid for java Pattern [A-Z-[OI]] and correct one is [A-Z&&[^OI]] (see Character classes in docs)

XML Schema Part 2: Datatypes Second Edition makes a distinction between negation (ˆ) and subtraction (-) See "Negative Character Group" and "Character Class Subtraction" at regex chapter.

A negative character group is a ·positive character group· preceded by the ^ character. For all ·positive character group·s P, ^P is a valid negative character group, and C(^P) contains all XML characters that are not in C(P).

A character class subtraction is a ·character class expression· subtracted from a ·positive character group· or ·negative character group·, using the - character.

A "translation" between the different syntax is required

jomarko commented 2 months ago

Thank you for reporting. Will we add such example into https://kiegroup.github.io/dmn-feel-handbook/#matches-input-pattern-flags @gitgabrio ?

gitgabrio commented 2 months ago

Thank you for reporting. Will we add such example into https://kiegroup.github.io/dmn-feel-handbook/#matches-input-pattern-flags @gitgabrio ?

@jomarko TBH I do not know. As far as I can see, that page is used only to show the syntax of the different functions. Inside TCK (and also in our unit tests) there are lot of different examples/cases, and I'm not sure the dmn-feel-handbook is meant for that. @baldimir @yesamer wdyt ?

yesamer commented 2 months ago

@gitgabrio I agree.