eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
428 stars 179 forks source link

Sequences reordered on evaluation of XPath. #2097

Open paulmer opened 6 years ago

paulmer commented 6 years ago

Please fill in the following sections:

Please note I posted this problem in July on the exist-open mailing list, and @adamretter has taken note of the issue.

What is the problem

A sequence that is constructed from the contents of multiple documents is ordered differently depending on the XPath expressions evaluated against the sequence. (The files necessary to reproduce this issue are attached after this description.) The code in question starts with a sequence of XML document names. Each document contains a list of XML tags with a key and value attribute. second sequence consisting of these tags is constructed using the "doc()" function by iterating over the document name sequence. Finally, a third sequence is constructed by selecting the first item in the sequence 2 that matches each key value. For example, if both document 1 and 2 contain a value for the key "a", then the value from document 1 should be selected for the new list. If only document 2 contains a value for the key "b", then that value should be selected.

If an expression to gather the list of key values from the second sequence is evaluated before the third sequence is constructed, the order of items in the second sequence is changed (regardless of the presence of the order {...} directive) so that the correct values are no longer selected.

Describe exactly what you see (e.g. an output of an XQuery)

What did you expect

Describe what you expected to happen. Add for example a reference to a specification.

Given file 1 contains

<values>
    <item key="a" value="1-A"/>
</values>

and file 2 contains

<values>
    <item key="a" value="2-A"/>
    <item key="b" value="2-B"/>
</values>

The result should be

<values>
    <item key="a" value="1-A"/>
    <item key="b" value="2-B"/>
</values>

If the expression ".../item/@key" is evaluated on the intermediate sequence first (even if the result of the evaluation is not used), the result is:

<values>
    <item key="a" value="2-A"/>
    <item key="b" value="2-B"/>
</values>

Further the test code shows that the intermediate sequence is being reordered by the expression evaluation from:

       <allParams>
            <values>
                <item key="a" value="1-A"/>
            </values>
            <values>
                <item key="a" value="2-A"/>
                <item key="b" value="2-B"/>
            </values>
       </allParams>

To

       <allParams>
            <values>
                    <item key="a" value="2-A"/>
                    <item key="b" value="2-B"/>
            </values>
            <values>
                    <item key="a" value="1-A"/>
            </values>
        </allParams>

Describe how to reproduce or add a test

The attached 3 files are a self contained demonstration.

test-order.gz

Unzip this archive to create a directory test-order. Store this directory in eXist using the Java Admin client, then open the contained "test.xql" file and submit to see the results. The output contains the list of files, and the results of two tests. Each test shows the "first" items from the files, and the combined list of items from which the "first" were identified. The second result consistently lists the contents of file 2 before file 1 which is incorrect.

Context information

Please always add the following information

paulmer commented 6 years ago

Here is the full output (slightly reformatted for readability) from the test script I provided:

    <result>
        <!-- The list of files. -->
        <file path="1.xml"/>
        <file path="2.xml"/>

        <!--
        The values as read from the files
    -->
        <all-content>
            <values>
                <item key="a" value="1-A"/>
            </values>
            <values>
                <item key="a" value="2-A"/>
                <item key="b" value="2-B"/>
            </values>
        </all-content>

        <!--
        The results when finding the first value for each key
        without evaluating item/@key.
      -->
        <merged-without-eval-item-key>
            <items>
                <item name="a" value="1-A"/>
                <item name="b" value="2-B"/>
            </items>
            <allParams>
                <values>
                    <item key="a" value="1-A"/>
                </values>
                <values>
                    <item key="a" value="2-A"/>
                    <item key="b" value="2-B"/>
                </values>
            </allParams>
        </merged-without-eval-item-key>

        <!--
        The results when finding the first value for each key
            after evaluating item/@key.   These should be identical
        to the previous result.
      -->
        <merged-with-eval-item-key>
            <items>
                <item name="a" value="2-A"/>
                <item name="b" value="2-B"/>
            </items>
            <allParams>
                <values>
                    <item key="a" value="2-A"/>
                    <item key="b" value="2-B"/>
                </values>
                <values>
                    <item key="a" value="1-A"/>
                </values>
            </allParams>
        </merged-with-eval-item-key>
    </result>
joewiz commented 6 years ago

Updating this thread with some additional info posted to exist-open at https://exist-open.markmail.org/thread/4tc4eusdj6uol7qs.

@adamretter wrote:

Paul I have this on my TODO list and will get to it...

Before I do, can you tell me, have you tried this in Saxon? If not could you?

@paulmer replied:

I assume you mean trying this with Saxon from the command line against two files? I ran the test that way, using Saxon-EE 9.8.0.12J as supplied in the Oxygen editor. Both results (merged-without-eval-item-key, merged-with-eval-item-key) returned the expected items 1-A, 2-B.

Here's the key code difference, plus the actual results, if that's helpful:

(:declare variable $dir         := 'xmldb:///db/tests';:)
declare variable $dir         := '.';

declare function local:files() as element()* {
    <files>
        <file path="{concat($dir, '/1.xml')}"/>
        <file path="{concat($dir, '/2.xml')}"/>
    </files>
};
<?xml version="1.0" encoding="UTF-8"?>
<result>
   <files>
      <file path="./1.xml"/>
      <file path="./2.xml"/>
   </files>
   <all-content>
      <values>
         <item key="a" value="1-A"/>
      </values>
      <values>
         <item key="a" value="2-A"/>
         <item key="b" value="2-B"/>
      </values>
   </all-content>
   <merged-without-eval-item-key>
      <items>
         <item name="a" value="1-A"/>
         <item name="b" value="2-B"/>
      </items>
      <allParams>
         <values>
            <item key="a" value="1-A"/>
         </values>
         <values>
            <item key="a" value="2-A"/>
            <item key="b" value="2-B"/>
         </values>
      </allParams>
   </merged-without-eval-item-key>
   <merged-with-eval-item-key>
      <items>
         <item name="a" value="1-A"/>
         <item name="b" value="2-B"/>
      </items>
      <allParams>
         <values>
            <item key="a" value="1-A"/>
         </values>
         <values>
            <item key="a" value="2-A"/>
            <item key="b" value="2-B"/>
         </values>
      </allParams>
   </merged-with-eval-item-key>
</result>
paulmer commented 6 years ago

Just noticed the tests zip I attached was empty for some reason, sorry. It seems GitHub isn't accepting my zip file. Please contact me directly for the source code, or refer to the referenced email which contains the sources.

duncdrum commented 5 years ago

I'm a bit confused about how to reproduce this properly, can ask everyone to gather the materials necessary into a self containing test here on this ticket. thx

adamretter commented 5 years ago

@duncdrum the tests are attached to the email.

duncdrum commented 5 years ago

@adamretter yes, but it really helps us when reproducing when we don't have to go through comments follow multiple email links, find the right one, copy the relevant stuff, and then see how it goes first.

paulmer commented 5 years ago

@duncdrum Did you see my note asking that you contact me directly for the test files since something is corrupting the zip file of my tests in Gitlab?

duncdrum commented 5 years ago

@paulmer indeed, which suggest to me that the zip linked earlier doesn't really show what you need it to show. With the large number of open-issues and this all being volunteer time, it would help you get a more speedy resolution if the zip file is accessible from this interface, either via something like a dropebox link, or by posting the relevant code in here. Otherwise i m afraid many of our devs will simple skip this issue and move on to the next.

paulmer commented 5 years ago

@duncdrum Looks like GitHub will accept the files packaged in a .gz file now, so I've replaced the zip file in the original report.