Closed ChristianGruen closed 9 months ago
With the last fix, all previously successful tests are running again.
I found at least three more tests that return the wrong results (with both BaseX 10.7 and the current code base):
matches('x', '(a)|\1'),
matches('a', '(?:(b)?a)\1'),
matches('babadad', '^((.)?a\2)+$')
Taken from:
I assume that the patterns of the tests with the IDs p888
– p891
should also be rejected:
for $p in ('([\d-z]+)', '([\d-z]+)', '([\d-\s]+)', '([\d-\s]+)')
let $success := try { matches('x', $p) } catch * { 'expected' }
return $p || ': ' || $success
…no hurries!
Hi @ChristianGruen,
today I had a look at the XQuery tests relating to fn:matches
. The test run reports a total of 7 problems:
<results>
<fail>
<test id="p294" regex="(a)|\1" input="x" result="y"/>
</fail>
<fail>
<test id="p295" regex="(?:(b)?a)\1" input="a" result="y"/>
</fail>
<fail>
<test id="p303" regex="^((.)?a\2)+$" input="babadad" result="y"/>
</fail>
<fail>
<test id="p888" regex="([\d-z]+)" input="a0-za" result="Sy"/>
</fail>
<fail>
<test id="p889" regex="([\d-z]+)" input="-" result="sc"/>
</fail>
<fail>
<test id="p890" regex="([\d-\s]+)" input="a0- z" result="Sy"/>
</fail>
<fail>
<test id="p891" regex="([\d-\s]+)" input="-" result="sc"/>
</fail>
</results>
Tests p294, p295, and p303 fail, because the JRE's regex engine does not match backreferences, if the corresponding group has not been matched. The reason for not matching it is
or
,I am proposing two changes for handling this:
or
(will fix p294)I still have to work on the actual changes.
The patterns in tests p888-p891should be rejected, because according to XML Schema Part 2,
The - character is a valid character range only at the beginning or end of a ·positive character group·.
In these cases, it follows a MultiCharEsc, so it is the not the meta character of an seRange, but would have to go as a XmlCharIncDash, forming a single character charRange. But the above says that this is only valid at the begin or end of a charGroup.
I think that the pattern in p303 also is not valid with respect to the specification of backreferences, which says
The regular expression is invalid if a back-reference refers to a capturing sub-expression that does not exist or whose closing right parenthesis occurs after the back-reference.
Should we reject it? If yes, that would call for a change of qt4tests and qt3tests.
@GuntherRademacher Your analysis reads fine, as do your suggestions for fixing p294 and p295.
For the other patterns, it would be great if you could report this back to https://github.com/w3c/qt3tests (as suggested), either in an issue or via a pull request. Thanks!
PS: Commits to qt3tests are occasionally migrated to the qt4tests repository (see https://github.com/w3c/qt3tests/issues/54).
Sorry for the confusion: the pattern in p303 of course is valid, because group 2 is closed before the back-reference occurs. Also the code to check for this condition is in place and works as it should.
@GuntherRademacher I must take back my statement. I have indeed overlooked various tests that are now failing: