eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
429 stars 179 forks source link

not() function failing on boolean sequence #2308

Open xatapult opened 5 years ago

xatapult commented 5 years ago

What is the problem

The not() function used in a predicate on a boolean sequence fails... I'm trying to count the number of false or true entries in a sequence of booleans.

  1. Counting the number of false entries:
    1. count((true(), false(), false())[not(.)]) fails with the message: item of '(true, false, false)...' is not a node, and sequence length > 1
    2. count((true(), false(), false())[. eq false()]) works
  2. Counting the number of true entries:
    1. count((true(), false(), false())[.]) works

Now 1.i. looks like a bug. In XSLT (Saxon) it simply works.

What did you expect

That this expression count((true(), false(), false())[not(.)]) works as expected and as it does in Saxon.

Describe how to reproduce or add a test

Run the attached XQuery

Context information

joewiz commented 5 years ago

Similarly, these expressions return the same error when they should not:

(true(), false())[not(.)]
(1, 2)[not(.)]
("a", "b")[not(.)]

The full error message:

err:FORG0006 effectiveBooleanValue: first item of '(a, b)' is not a node, and sequence length > 1 [source: xquery version "3.1"; ("a", "b")[not(.)]]

This error message is incorrect in two respects:

  1. There is no requirement in the rules for determining effective boolean values that only nodes can be tested. See the 6 rules for determining effective boolean values at https://www.w3.org/TR/xpath-30/#id-ebv. The (true(), false()) case should trigger rule 3; ("a", "b") by rule 4; and (1, 2) by rule 5.
  2. The sequence length should have no bearing on this filter expression, since according to https://www.w3.org/TR/xpath-30/#id-filter-expression, the filter should "be applied to each item in the base expression in turn".
adamretter commented 5 years ago

So I took a quick look into this. Unfortunately the root of the problem is that eXist-db's parser and XQuery engine does not correctly understand . i.e. the Context Item, instead it treats it as self::node() with a bunch of hacks for when it might actually mean the context item.

There are two hacks that I could implement to fix this:

  1. Commenting out lines 281-296 of Predicate.java. This disables an optimisation, and gives us the correct result, but also will make other boolean expressions where the context sequence is not required slower as they will have to be evaluated for every item in the context sequence.

  2. LocationStep.java line 108: Removing the !this.inPredicate expression which forces every . or self::node() to have a dependency on the context item. Whilst this would fix our issue with the Context Item, unfortunately this will also slow down expressions which genuinely use self::node().

The correct fix is to correct the XQuery Parser so that it correctly differentiates between self::node() and the . (Context Item) when constructing the AST. Next we need to modify the XQuery engine so that it knows what a Context Item is.

This is not a small piece of work, and as the existing XQuery Parser is outdated (Antlr2) and contains many other lurking issue, I think it is unlikely I will fix this personally. My efforts are more focued on a new XQuery Parser and Engine. However, anyone else is welcome to take a stab at fixing it in eXist-db...

adamretter commented 5 years ago

@wolfgangmm any comments, perhaps you see an alternative path for fixing this?

wolfgangmm commented 5 years ago

eXist handles fn:not in a particular way by trying to evaluate it as a set operation. Obviously this approach is wrongly applied here. A naive fix would likely result in many expressions becoming slower by an order of magnitude, so we need to be careful here. We'll keep this issue open, but fixing will take considerable time.

adamretter commented 5 years ago

As Erik mentions, the temporary workaround is: count((true(), false(), false())[. eq false()])

line-o commented 5 years ago

I just found out that wrapping fn:not() in a function works, too.

declare function local:wrappedNot($a) { not($a) };

(true(), true(), true(), false(), true())[local:wrappedNot(.)]
line-o commented 5 years ago

I created some tests on my way to this discovery. If you think they - or some of them - are useful, I can prepare a PR.

duncdrum commented 5 years ago

yes please if you could create a pr with both pending test and some that are passing like the wrapped function thingy that would be great. If you could also add some xqdoc comments with a link to the issue number that would be nice, thx.