eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
429 stars 179 forks source link

Attribute range index produces incorrect results when used with ft:query. #3114

Open paulmer opened 4 years ago

paulmer commented 4 years ago

What is the problem

An XPath expression of the form

//record[ft:query(.//field[@attr="value"], <phrase>text</phrase>)

ignores the [@attr="value"] predicate if a range index is defined on @attr. The result of the expression is equivalent to:

//record[ft:query(.//field, <phrase>text</phrase>)

If the range index is removed, the xpath works as expected.

What did you expect

See above.

Describe how to reproduce or add a test

Describe how we can can reproduce the problem.

The attached test demonstrates the difference in behavior when a range index is and is not present.

rft-test.zip

Context information

Exist 5.1.0 snapshot built from source. Confirmed on both Linux and Mac platforms, Java 1.8

joewiz commented 4 years ago

@paulmer As a workaround, could you try this?

//field[@attr="value"][ft:query(., <phrase>text</phrase>)]/ancestor::record
paulmer commented 4 years ago

@joewiz That actually works, as does pretty much any variation I've found that doesn't include the '@attr' expression within the ft:query call, but alas, my real use cases are much more complex with the entire XPath expression being assembled from a number of pieces of configuration that can vary. I suppose I'm being stubborn, but I'd rather tackle the bug than try to code around it. I was kind of hoping someone would think "Aha! I know what's causing that.", but no such luck.

joewiz commented 4 years ago

@paulmer My guess is that the predicate is being blown away in the query optimization surrounding ft:query - somewhere here: https://github.com/eXist-db/exist/blob/develop/extensions/indexes/lucene/src/main/java/org/exist/xquery/modules/lucene/Query.java#L160-L177.

dizzzz commented 4 years ago

you could try switching off the optimisation AFAIK;

declare option exist:optimize "enable=yes|no";

(source: org.exist.xquery.Optimizer)

paulmer commented 4 years ago

@dizzzz @joewiz Thank you both for the suggestions. Optimization is fairly important to my queries given the size of my database. I dug into this issue far enough to understand that there isn't going to be an easy fix in the Java code, at least, not for me without a great deal of research into the code. My impression with only a couple of hours of studying the code is that the result of the ft:query lookup is being used to filter not the set of fields passed to it, but the set of records in the outer expression. I thought some more about Joe's suggestion and found that for all the current use cases I have, it is possible to mechanically generate a different expression of the form //record[.//field[@name='name'][ft:query(., )]] that doesn't confuse the optimizer, so I'm going that route for now. One little oddity I found that someone looking at this problem might want to know is that when I added a logging statement to the beginning of org.exist.xquery.modules.lucene.Query.preSelect() that evaluated contextSequence.getItemCount(), the problem went away. I didn't track down what side-effect was causing that behavior, though.