eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
429 stars 179 forks source link

[BUG] `binary` Attribute in Lucene full text fields does not work any more. #5431

Open scheidelerl opened 2 months ago

scheidelerl commented 2 months ago

Describe the bug The use of binary attribute for fields in full text index does not work any more.

Expected behavior The use of the binaryattribute should work.

To Reproduce Try to index a field with as binary.

With binary:

xquery version "3.1";

module namespace t="http://exist-db.org/xquery/test";
(:  LIBRARIES  :)
declare namespace test="http://exist-db.org/xquery/xqsuite";
(:  NAMESPACES  :)
declare namespace array="http://www.w3.org/2005/xpath-functions/array";
declare namespace exist="http://exist.sourceforge.net/NS/exist";
declare namespace ft="http://exist-db.org/xquery/lucene";
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace xmldb="http://exist-db.org/xquery/xmldb";

(:  VARIABLES  :)
declare variable $t:XML :=
<div>
    <test>Adm. 1,10</test>
    <test>Bdm. 1,11</test>
    <test>Cdm. 1,12</test>
    <test>Edm. 1,1</test>
    <test>Fdm. 1,2</test>
    <test>Gdm. 1,3</test>
    <test>Zdm. 1,4</test>
    <test>Wdm. 1,5</test>
    <test>Odm. 1,6</test>
    <test>Ydm. 1,7</test>
    <test>Cdm. 1,8</test>
    <test>Vdm. 1,9</test>
    <test>Pdm. 1,13</test>
    <test>Edm. 1,14</test>
</div>;

declare variable $t:xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Full-text indexing with Lucene -->
    <lucene>
      <!-- Elements upon which to build an index. -->
      <text qname="test">
        <field name="sortable" expression="./string()" type="xs:string" binary="yes"/>
      </text>
    </lucene>
  </index>
</collection>;

(:  FUNCTIONS  :)
declare
    %test:setUp
function t:setup() {
    let $testCol    := xmldb:create-collection("/db", "test")
    let $indexCol   := xmldb:create-collection("/db/system/config/db", "test")
    return (
        xmldb:store("/db/test", "test.xml", $t:XML),
        xmldb:store("/db/system/config/db/test", "collection.xconf", $t:xconf),
        xmldb:reindex("/db/test")
      )
};

declare
    %test:tearDown
function t:tearDown() {
    xmldb:remove("/db/test"),
    xmldb:remove("/db/system/config/db/test")
};

declare
    %test:name('Sorted result.')
    %test:assertExists
    %test:assertXPath('count(doc("/db/test/test.xml")//test) eq count($result)')
    %test:assertError("err:XPTY0004")
function t:sorted-result() as xs:string* {
    let $options := map {
        'fields': ('sortable')
    }
    let $index := doc("/db/test/test.xml")/div[ft:query(., (), $options)]
    return 
    (

        let $values := ft:binary-field($index, "sortable","xs:string")
            where count($values gt 0 )
        for $field in $values
            order by $field ascending
        return 
        (
            $field
        )
    )
};

Without binary:

xquery version "3.1";

module namespace t="http://exist-db.org/xquery/test";
(:  LIBRARIES  :)
declare namespace test="http://exist-db.org/xquery/xqsuite";
(:  NAMESPACES  :)
declare namespace array="http://www.w3.org/2005/xpath-functions/array";
declare namespace exist="http://exist.sourceforge.net/NS/exist";
declare namespace ft="http://exist-db.org/xquery/lucene";
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace xmldb="http://exist-db.org/xquery/xmldb";

(:  VARIABLES  :)
declare variable $t:XML :=
<div>
    <test>Adm. 1,10</test>
    <test>Bdm. 1,11</test>
    <test>Cdm. 1,12</test>
    <test>Edm. 1,1</test>
    <test>Fdm. 1,2</test>
    <test>Gdm. 1,3</test>
    <test>Zdm. 1,4</test>
    <test>Wdm. 1,5</test>
    <test>Odm. 1,6</test>
    <test>Ydm. 1,7</test>
    <test>Cdm. 1,8</test>
    <test>Vdm. 1,9</test>
    <test>Pdm. 1,13</test>
    <test>Edm. 1,14</test>
</div>;

declare variable $t:xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Full-text indexing with Lucene -->
    <lucene>
      <!-- Elements upon which to build an index. -->
      <text qname="div">
        <field name="sortable" expression="./test/string()"/>
      </text>
    </lucene>
  </index>
</collection>;

(:  FUNCTIONS  :)
declare
    %test:setUp
function t:setup() {
    let $testCol    := xmldb:create-collection("/db", "test")
    let $indexCol   := xmldb:create-collection("/db/system/config/db", "test")
    return (
        xmldb:store("/db/test", "test.xml", $t:XML),
        xmldb:store("/db/system/config/db/test", "collection.xconf", $t:xconf),
        xmldb:reindex("/db/test")
      )
};

declare
    %test:tearDown
function t:tearDown() {
    xmldb:remove("/db/test"),
    xmldb:remove("/db/system/config/db/test")
};

declare
    %test:name('Sorted result.')
    %test:assertExists
    %test:assertXPath('count(doc("/db/test/test.xml")//test) eq count($result)')
    %test:assertError("err:XPTY0004")
function t:sorted-result() as xs:string* {
    let $options := map {
        'fields': ('sortable')
    }
    let $index := doc("/db/test/test.xml")/div[ft:query(., (), $options)]
    return 
    (

        let $values := ft:field($index, "sortable","xs:string")
            where count($values gt 0 )
        for $field in $values
            order by $field ascending
        return 
        (
            $field
        )
    )
};

Context (please always complete the following information)

Additional context

line-o commented 2 months ago

@scheidelerl Thank you for this complete issue report. I would like to know with which version of exist-db the above test-suite passes.

line-o commented 2 months ago

In order to read values of binary fields a new function was added ft:binary-field and I cannot see you using it. Maybe that is the issue?

line-o commented 2 months ago

see also https://exist-db.org/exist/apps/doc/lucene#retrieve-fields "Retrieving Field Content"

scheidelerl commented 2 months ago

Hey, thank you for the reply. The eXist-db Version is the current build 6.2.0. Installed with the JAR Installer. If I use ft:binary-field($index, ‘sortable’, ‘xs:string’) instead, which I think should be the intended way of using it, it doesn't work either. In eXide the attribute binary show this linter error : [cvc-complex-type.3.2.2: Attribute 'binary' is not allowed to appear in element 'field'] The eXist log file does not say anything about it. If I use the field without binary no problem occur. If I try to apply the collection.xconf with eXide this error occurs: Failed to apply configuration: DocValuesField "sortable" appears more than once in this document (only one value is allowed per field)

scheidelerl commented 2 months ago

If I use only doc("/db/test/test.xml")/div[ft:query(., ())] with the binary attribute in the field child, the result is empty, and ft:field($node as node(), $field as xs:string) and ft:binary-field($node as node(), $field as xs:string, $type as xs:string) throw this errors:

And I know what it means, it's self-explanatory. But it means that it does not perform the full index because an error occurs. Which is not listed in the log or otherwise and is ultimately related to the attribute, because it works without it.

line-o commented 2 months ago

@scheidelerl you need to have some hits in order to sort them using the binary field values. I suspect that your call to ft:query returns an empty sequence. Can you check that?

line-o commented 2 months ago

In eXide the attribute binary show this linter error : [cvc-complex-type.3.2.2: Attribute 'binary' is not allowed to appear in element 'field']

It can very well be, that the schema was not updated to add the binary attribute.

scheidelerl commented 2 months ago

I updated the test above and added a new one, one with and one without binary. It works when I use the element directly, because binary seems to need a single value. The hint was the error with the index apply in eXide.

This brings me to the following questions:

  1. Why is this not the case for normal fields, so that the behaviour is adaptable when I realize that I don't need to query certain values?
  2. Why is there no reference to this in the documentation and please don't tell me that it is sufficiently explained because the default value is specified as xs:string?
  3. Why the log does not show this as an error when I apply the index?
  4. Why I cannot declare type="xs:string*"to prevent this error?
  5. Why this works in 5.4.0?

!!!! → 6. What do I have to do if I only want to perform a query above the parent level and have several values in one field, but want to have faster access?

line-o commented 2 months ago

Why this works in 5.4.0?

As far as I know binary fields were added in version 6.2.0. That means it cannot work in version 5.4.0.