eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
422 stars 179 forks source link

Context problem with map:get() in xpath #2205

Open ahenket opened 5 years ago

ahenket commented 5 years ago

eXist-db 4.4.0 (build Sept 21)

Code:

let $myXml      := doc(xmldb:store('/db/apps', 'ttt.xml', <x><y k1="a" k2="1"/><y k1="a" k2="2"/></x>))/x
let $myMap      := map:new((map:entry('a', '1'), map:entry('b', '2')))

return
    $myXml//y[@k1][@k2 = map:get($myMap, @k1)]

Yields:

err:XPTY0004 checking function parameter 2 in call map:get($myMap, untyped-value-check[xs:anyAtomicType, attribute::{}k1]): XPTY0004: The actual cardinality for parameter 2 does not match the cardinality declared in the function's signature: map:get($map as map(), $key as xs:anyAtomicType) item(). Expected cardinality: exactly one, got 2. [at line 6, column 42)

Expected: <y k1="a" k2="1"/>

There are two ways to work around this:

Change order of @k2 and map:get() like so: $myXml//y[@k1][map:get($myMap, @k1) = @k2]

or work from an in memory xml fragment.

So there seems to be a context problem with database nodes and the map function. The down side of both workarounds is that they skip using the index if you have any on @k2.

joewiz commented 5 years ago

I can confirm the bug. Here is a single query that demonstrates the difference between the in-memory and on-disk operation here:

xquery version "3.1";

let $in-memory := 
    <x>
        <y k1="a" k2="1"/>
        <y k1="a" k2="2"/>
    </x>
let $store := xmldb:store('/db', 'test.xml', $in-memory)
let $on-disk := doc("/db/test.xml")/x
let $map := 
    map { 
        "a": "1",
        "b": "2"
    }
for $source in ($in-memory, $on-disk)
return
    try {
        $source/y[@k2 = map:get($map, @k1)]
    } catch * {
        map {
            "code": $err:code,
            "description": $err:description,
            "value": $err:value,
            "module": $err:module,
            "line-number": $err:line-number,
            "column-number": $err:column-number,
            "additional": $err:additional
        }
    }

This returns two items:

(: result of in-memory operation; expected :)
<y k1="a" k2="1"/>

(: result of on-disk operation; unexpected error :)
map {
    "column-number": 39,
    "line-number": 18,
    "additional": (),
    "module": "String",
    "code": Q{http://www.w3.org/2005/xqt-errors}XPTY0004,
    "value": (),
    "description": "It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in 2.5.4 SequenceType Matching. checking function parameter 2 in call map:get($map, untyped-value-check[xs:anyAtomicType, attribute::{}k1]): XPTY0004: The actual cardinality for parameter 2 does not match the cardinality declared in the function's signature: map:get($map as map(*), $key as xs:anyAtomicType) item()*. Expected cardinality: exactly one, got 2."
}
adamretter commented 5 years ago

@joewiz interestingly if you modify your predicate to use atomic equality, you will get the correct result. e.g. change:

[@k2 = map:get($map, @k1)]

to:

[@k2 eq map:get($map, @k1)]
adamretter commented 5 years ago

I have managed to simplify this further by removing the need for the map:

xquery version "3.1";

declare function local:get($key as xs:anyAtomicType) as item()* {
  "1"
};

let $in-memory := 
    <x>
        <y k1="a" k2="1"/>
        <y k1="a" k2="2"/>
    </x>
let $stored := doc(xmldb:store('/db', 'test.xml', $in-memory))/x
return

    for $source in ($in-memory, $stored)
    return
        try {
            $source/y[@k2 = local:get(@k1)]
        } catch * {
            <error
                     code="{$err:code}"
                     description="{$err:description}"
                     value="{$err:value}"
                     module="{$err:module}"
                     line-number="{$err:line-number}"
                     column-number="{$err:column-number}"
                     additional="{$err:additional}"/>
        }
adamretter commented 5 years ago

This is a rather horrible Context Item one, in some ways not dissimilar to https://github.com/eXist-db/exist/issues/2308

What happens is that an optimisation is applied if both: 1) the expression inside the predicate operates on stored nodes 2) and does not depend on the context item.

You can see the optimisation here: https://github.com/eXist-db/exist/blob/eXist-4.5.0/src/org/exist/xquery/GeneralComparison.java#L456

The problem here is that obviously the expression does depend on the Context Item. Debugging the code there shows that eXist-db actually fails to recognise the dependency on the Context Item for both in-memory and stored nodes. However, as the optimisation is only applied for stored nodes, the in-memory version completes correctly.

The solution would be to either: 1) correctly detect the dependency on the Context Item for the expression inside the predicate. 2) disable the optimisation.

I suspect that (1) is a lot of difficult work in the depths of the XQuery Parser, I have already discussed that in #2308. The quickest solution is (2). Again I suspect that correctness is more important than speed. I will leave @wolfgangmm to comment on what he thinks we should do here...

ahenket commented 5 years ago

Both accuracy and speed are important. If pressed for a choice, accuracy wins. We are already in a bad place performance wise and have had numerous rewrite sessions to fix that. The maps actually came in as a performance booster. Would hate to loose that.

adamretter commented 5 years ago

@ahenket could I suggest you tabling this for discussion at a community call?

ahenket commented 5 years ago

Could do. I'm having trouble finding the schedule, or is this info still accurate?

adamretter commented 5 years ago

The last call details are here - https://docs.google.com/document/d/1I-3W6jfJ2ZdF8F6rG_JpnPt-Lh23DaNZfEtEch6nec0/edit?usp=sharing

There are two calls a month always on a Monday, usually two weeks apart. The timing alternates between an earlier time and a later time.

The first meeting in the month is at the earlier time of: 18:30-19:30 UTC 7:30-8:30pm Europe Central (CET) / 10:30am US Pacific (PST) / 1:30pm US Eastern (EST) / 6:30pm UK (GMT) / 02:30 Malaysia (MYT)

The second meeting of the month is at the later time of: 14:30-15:30 UTC 3:30-4:30pm Europe Central (CET) / 6:30am US Pacific (PST) / 9:30pm US Eastern (EST) / 2:30pm UK (GMT) / 22:30 Malaysia (MYT)

There will likely not be a call on Dec 31st due to Christmas/New Year holidays. So I think the next call will be 7th January 2019 at 18:30-19:30 UTC. @joewize do I have that correct? We should maybe try and get a schedule written up somewhere?

Unfortunately I personally cannot attend the earlier call times as I am currently in Malaysia, I can only attend the later call times.

On Tue, 18 Dec 2018 at 18:13, Alexander Henket notifications@github.com wrote:

Could do. I'm having trouble finding the schedule, or is this info still accurate?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

-- Adam Retter

skype: adam.retter tweet: adamretter http://www.adamretter.org.uk

adamretter commented 5 years ago

@ahenket there will be a call tomorrow - https://docs.google.com/document/d/1-5iWMTIdnev3eaiRwAJPm7e2OmfbDroi9e6pVhJIq6A/edit

adamretter commented 5 years ago

Likely related - https://github.com/eXist-db/exist/issues/1186