BaseXdb / basex

BaseX Main Repository.
http://basex.org
BSD 3-Clause "New" or "Revised" License
678 stars 268 forks source link

optimized query and descendant-or-self #2130

Closed nverwer closed 2 years ago

nverwer commented 2 years ago

Description of the Problem

The following query should give a result, but it doesn't:

let $id as xs:string := 'bwbr0002221/2002-01-01/0/wet'
let $d-roots as element()* := collection('cwb')/work
let $d := $d-roots//*[@id = $id]/descendant-or-self::component
return $d

This is caused by the way BaseX optimizes this query to

db:attribute("cwb", "bwbr0002221/2002-01-01/0/wet")/self::attribute(id)/parent::*[ancestor::work/parent::document-node()]/descendant::component

The optimized version finds an id attribute node with the right value, then takes the parent, which is the element with the id attribute. The predicate between [] does not do much. The query then finds descendant::components, whereas the original query had descendant-or-self::component. This is important, because in this specific case the <component> contains the right id attribute.

When I change the query a little bit:

let $id as xs:string := 'bwbr0002221/2002-01-01/0/wet'
let $d-roots as element(work)* := collection('cwb')/*
let $d := $d-roots//*[@id = $id]/descendant-or-self::component
return $d

I do get the expected <component id="bwbr0002221/2002-01-01/0/wet"/>. Then the optimized query is:

((db:open-pre("cwb", 0), ...)/* treat as element(work)+)/descendant::*[(@id = "bwbr0002221/2002-01-01/0/wet")]/descendant-or-self::component

Expected Behavior

The first query

let $id as xs:string := 'bwbr0002221/2002-01-01/0/wet'
let $d-roots as element()* := collection('cwb')/work
let $d := $d-roots//*[@id = $id]/descendant-or-self::component
return $d

should be optimized to

db:attribute("cwb", "bwbr0002221/2002-01-01/0/wet")/self::attribute(id)/parent::*[ancestor::work/parent::document-node()]/descendant-or-self::component

which gives the correct result. (Instead of descendant::component, use descendant-or-self::component.)

Steps to Reproduce the Behavior

  1. Insert this XML in the cwb database:
    <work>
    <work-components id="bwbr0002221" type="wet">
    <work-component id="bwbr0002221/wet" type="wetgeving">
      <expression-component id="bwbr0002221/1956-06-08/wet" inwerking="1956-06-08" version-id="3836102">
        <metadata/>
        <component id="bwbr0002221/2002-01-01/0/wet"/>
        <component id="bwbr0002221/2002-07-01/0/wet"/>
        <component id="bwbr0002221/2003-01-01/0/wet"/>
      </expression-component>
    </work-component>
    </work-components>
    </work>
  2. Make sure that an attribute index is created.
  3. Run the queries given above:
    let $id as xs:string := 'bwbr0002221/2002-01-01/0/wet'
    let $d-roots as element()* := collection('cwb')/work
    let $d := $d-roots//*[@id = $id]/descendant-or-self::component
    return $d

    and

    let $id as xs:string := 'bwbr0002221/2002-01-01/0/wet'
    let $d-roots as element(work)* := collection('cwb')/*
    let $d := $d-roots//*[@id = $id]/descendant-or-self::component
    return $d

    and observe the difference.

Do you have an idea how to solve the issue?

The optimizer generates a descendant::component, which should be descendant-or-self::component. This happens when the query begins with db:attribute(). Note that db:attribute returns an attribute node, and maybe the query optimizer thinks it returns the element containing that attribute node?

What is your configuration?

ChristianGruen commented 2 years ago

Thanks for the concise bug report. New maven artifacts have been uploaded. BaseX 10 will be released next week; BaseX 9.7.4 can be expected soon after.