Open joewiz opened 1 year ago
At a guess, as this happens during the last step, I might think that there is possibly a bad optimisation at play.
There are a lot of different expressions at play in the reproducible example.
It could be simpler to debug this if it used for
expressions instead of every/in/satisfies
and except
expressions. Do you think it would be possible to rewrite this a little?
Re "this happens during the last step," I had the same idea, so I checked and confirmed that the results are identical if the "sort-3" test is moved to the final position; only "sort-3-wrapped" fails.
Yes, I think I could rewrite this to avoid every/in/satisfies/except.
only "sort-3-wrapped" fails.
Fascinating! :-)
Here's a version of the local:topological-sort function without the use of every/in/satisfies/except, and plus a variation of the simple query that lets you define how many nodes you want to see. Every test with more than 2 nodes (i.e., a $how-many > 2) fails in eXist. The same tests all pass in BaseX and Saxon.
xquery version "3.1";
declare function local:topological-sort($unordered, $ordered) {
if (empty($unordered)) then
$ordered
else
let $defined :=
for $node in $unordered
let $id-checks :=
for $id in $node/ref/@id
return
$id = $ordered/@id
return
if (count($id-checks[. eq false()]) ge 1) then
()
else
$node
return
local:topological-sort( $unordered[not(@id = $defined/@id)], ($ordered, $defined) )
};
let $alphabet := (string-to-codepoints("a") to string-to-codepoints("z")) ! codepoints-to-string(.)
for $how-many in 1 to 10
(:let $how-many := 3:)
let $letters := reverse(subsequence($alphabet, 1, $how-many))
let $nodes :=
for $size in 1 to $how-many
let $ids := reverse(subsequence($letters, 1, $size))
let $node-id := head($ids)
let $ref-ids := tail($ids)
return
<node id="{$node-id}">
{
for $ref-id in $ref-ids
return
<ref id="{$ref-id}"/>
}
</node>
let $original := <graph>{$nodes}</graph>
let $scrambled := <graph>{reverse($nodes)}</graph>
let $actual := <graph>{local:topological-sort($scrambled/node, ())}</graph>
let $passes := deep-equal($actual, $original)
let $result :=
<result how-many="{$how-many}" passes="{$passes}">
<scrambled>{$scrambled}</scrambled>
<expected>{$original}</expected>
<actual>{$actual}</actual>
</result>
return
if ($passes) then
()
else
$how-many
(: $result:)
For example, setting $how-many to 5 yields this result:
<result how-many="5" passes="false">
<scrambled>
<graph>
<node id="a">
<ref id="b"/>
<ref id="c"/>
<ref id="d"/>
<ref id="e"/>
</node>
<node id="b">
<ref id="c"/>
<ref id="d"/>
<ref id="e"/>
</node>
<node id="c">
<ref id="d"/>
<ref id="e"/>
</node>
<node id="d">
<ref id="e"/>
</node>
<node id="e"/>
</graph>
</scrambled>
<expected>
<graph>
<node id="e"/>
<node id="d">
<ref id="e"/>
</node>
<node id="c">
<ref id="d"/>
<ref id="e"/>
</node>
<node id="b">
<ref id="c"/>
<ref id="d"/>
<ref id="e"/>
</node>
<node id="a">
<ref id="b"/>
<ref id="c"/>
<ref id="d"/>
<ref id="e"/>
</node>
</graph>
</expected>
<actual>
<graph>
<node id="b">
<ref id="c"/>
<ref id="d"/>
<ref id="e"/>
</node>
<node id="c">
<ref id="d"/>
<ref id="e"/>
</node>
<node id="d">
<ref id="e"/>
</node>
<node id="e"/>
<node id="a">
<ref id="b"/>
<ref id="c"/>
<ref id="d"/>
<ref id="e"/>
</node>
</graph>
</actual>
</result>
I've left some commenting in that lets you switch to a single result or lets you see the full result for each hit.
Looking for a workaround I found this works - wrapping the sequence in an element
declare function local:topological-sort($unordered, $ordered-graph) {
if (empty($unordered))
then $ordered-graph/*
else
let $ordered := $ordered-graph/*
let $nodes := $unordered [ every $id in ref/@id satisfies $id = $ordered/@id]
return
if ($nodes)
then let $ordered-graph := element graph {($ordered , $nodes)}
return local:topological-sort($unordered except $nodes, $ordered-graph)
else () (: cycles so no order possible :)
};
@joewiz Excellent thanks. For testing and debugging purposes the less amount of code and variance in expressions the better, so let's go for this:
xquery version "3.1";
declare function local:topological-sort($unordered, $ordered) {
if (empty($unordered)) then
$ordered
else
let $defined :=
for $node in $unordered
let $id-checks :=
for $id in $node/ref/@id
return
$id = $ordered/@id
return
if (count($id-checks[. eq false()]) ge 1) then
()
else
$node
return
local:topological-sort( $unordered[not(@id = $defined/@id)], ($ordered, $defined) )
};
local:topological-sort(
<graph>
<node id="a">
<ref id="b"/>
<ref id="c"/>
</node>
<node id="b">
<ref id="c"/>
</node>
<node id="c"/>
</graph>/node
,
()
)
I may have made a mistake, I would certainly appreciate further sets of eyes, but I think that local:topological-sort
can be simplified further:
declare function local:topological-sort($unordered, $ordered) {
if (empty($unordered)) then
$ordered
else
let $defined :=
for $node in $unordered
return
if (exists($node/ref[not(@id = $ordered/@id)])) then
()
else
$node
return
local:topological-sort( $unordered[not(@id = $defined/@id)], ($ordered, $defined) )
};
declare function local:topological-sort($unordered, $ordered) {
if (empty($unordered)) then
$ordered
else
let $defined := $unordered[empty(ref[not(@id = $ordered/@id)])]
return
local:topological-sort( $unordered[not(@id = $defined/@id)], ($ordered, $defined) )
};
I can confirm that both "reduction" versions produce the same results as earlier variations, both in the "direct" form and the "xqsuite" form.
Also, with these "simple" variants of the test (i.e., not the xqsuite version above), for some reason it isn't necessary to wrap the nodes in a "graph" element to produce the incorrect results. Just passing the "node" elements to the sort function (instead of wrapping them in a "graph" element and selecting its child elements) produces the scrambled results.
xquery version "3.1";
declare function local:topological-sort($unordered, $ordered) {
if (empty($unordered)) then
$ordered
else
let $defined := $unordered[empty(ref[not(@id = $ordered/@id)])]
return
local:topological-sort( $unordered[not(@id = $defined/@id)], ($ordered, $defined) )
};
local:topological-sort(
(
<node id="a">
<ref id="b"/>
<ref id="c"/>
</node>,
<node id="b">
<ref id="c"/>
</node>,
<node id="c"/>
)
,
()
)
... returns:
<node id="b">
<ref id="c"/>
</node>
<node id="c"/>
<node id="a">
<ref id="b"/>
<ref id="c"/>
</node>
... whereas BaseX and Saxon return:
<node id="c"/>
<node id="b">
<ref id="c"/>
</node>
<node id="a">
<ref id="b"/>
<ref id="c"/>
</node>
I had a little time to look into this... The incorrect ordering of results are being returned because behind the scenes the directly constructed XML nodes (in-memory nodes) are placed into a ValueSequence
. The problem is that the ValueSequence code frequently tries to sort things into document order (for unknown reasons - https://github.com/eXist-db/exist/blob/develop-6.x.x/exist-core/src/main/java/org/exist/xquery/value/ValueSequence.java#L429.
It is this sorting that causes the described problem, if I comment out the sorting then you set the expected results from eXist-db that match BaseX and Saxon.
I am afraid that this is another instance of where someone did something in the past in the eXist-db code (probably for some sensible reason at the time), that doesn't make much sense when looked at today. There is also no documentation explaining why it was done that way, and it doesn't follow the XQuery standards. ValueSequece
is used in many places in eXist-db's XQuery Engine, and unfortunately the ValueSequence implementation and (its use) look like a complete minefield to me. Therefore it is hard to know what existing behaviour may be fixed and broken by removing this sorting.
On a side note - Hilariously the above code results in FastQSort.sort(...)
being executed some 54 times! When it need be executed 0 times. As sorting of data is a known slow operation in computer science, it is likely that fixing (or more likely implementing a replacement for ValueSequence) could yield serious performance gains across the board for all queries in eXist-db. This is no small undertaking though, as it would require both implementation of a comprehensive test suite, and careful analysis of W3C XQTS results to ensure no regressions.
Describe the bug
In certain circumstances, eXist loses the ordering of in-memory nodes.
Expected behavior
I expected eXist to retain the ordering of in-memory nodes.
To reproduce
As demonstrated in the simple example and full XQSuite test suite below, adapted from @KitWallace's report on exist-open, eXist's in-memory ordering of parameters (e.g., when constructing a sequence of items
($ordered, $defined)
) is scrambled. The same query in his email works as expected in BaseX and Saxon and fails only in eXist.)Simple query:
BaseX and Saxon return:
eXist returns:
XQSuite
The XQSuite results:
In other words, the problem only occurs when (a) we are operating on 3 items and (b) the input is constructed by wrapping the 3 items into an element and selecting its child nodes.
The logging demonstrates the point during the processing when the problem appears:
Notice how, with 1 node to go, the "ordered" nodes are correctly sorted:
<node id="c"/>
<node id="b"><ref id="c"/></node>
We'd expect the final run to sort the remaining node into the end - as the 3rd item. But in fact, the order is scrambled:
<node id="b"><ref id="c"/></node>
<node id="c"/>
<node id="a"><ref id="b"/><ref id="c"/></node>
I tried to trim this down to an even smaller reproducible example - sorting nodes into alphabetical order based on an attribute - but this variation didn't trigger the issue:
The results are returned in the correct order (
<x id="a"/>, <x id="b"/>, <x id="c"/>
), instead of a scrambled order as above.As a result, I think we have to conclude that something about the particular combination of factors shown in the XQSuite above - the particular sorting being done in local:topological-sort, together with the use of a path expression to supply parameters to the function - contributes to eXist's mangling of the order of the in-memory nodes.
Context (please always complete the following information)
Additional context
conf.xml
? none