Closed gioele closed 5 years ago
@gioele, try removing @xml:lang on the TEI element. To me, what eXist-db returns looks fine. It should only return four elements if four elements between the text nodes containing 'aMSa' and the entry elements have an @xml:lang. Zorba and BaseX return one element as well, no matter whether [1] is included or not, if the @xml:lang on the TEI element is omitted.
@jensopetersen: well, changing the data hides the bug but does not fix the it. :)
In my case the @xml:lang
attribute is there for an important reason and cannot be removed. Similarly I cannot remove the [1]
because it is key part of a longer XPath.
Regardless of that, given a piece of data and a query, all compliant XPath implementations should return the same data. Either eXist is wrong or Saxon and libxml are. As I said, this is just the shortest test case that demonstrates the problem. The content of the test case does not really matter, the fact that implementations return different results does.
@gioele, I tried the following query in Saxon-PE 9.5.1.5 (in oXygen), in eXist-db, in BaseX, and in Zorba, and they all return 4 entries,
xquery version "3.0";
declare namespace tei="http://www.tei-c.org/ns/1.0";
let $doc :=
<TEI xmlns="http://www.tei-c.org/ns/1.0" version="5" xml:id="monier" xml:lang="en">
<text>
<entry xml:id="lemma-anaMSin" ana="H1">
<sense>an-aMSa</sense>
</entry>
<entry xml:id="lemma-apaBraMSa" ana="H1">
<sense ana="H1A">ungrammatical language</sense>
</entry>
<entry xml:id="lemma-apaSabda" ana="H1">
<sense> bad or vulgar speech apa-BraMSa</sense>
</entry>
<entry xml:id="lemma-AMhaspatya" ana="H1">
<sense>belonging to the dominion of</sense>
</entry>
<entry xml:id="lemma-AnuvaMSya" ana="H1">
<sense>(fr. <w xml:lang="san-Latn-x-SLP1">anu-vaMSa</w>), belonging
to a race</sense>
</entry>
<entry xml:id="lemma-AnuvaMSya-no-elem" ana="H1">
<sense>(fr. anu-vaMSa), belonging to a race</sense>
</entry>
</text>
</TEI>
return
$doc//tei:entry
[./tei:sense//text()
[contains(., 'aMSa')]
[ancestor::*[@xml:lang][1]]
]
The second entry does not have a text node that contains 'aMSa', only an attribute value. So the in-memory execution of this query in eXist-db is OK.
Whether [1] is there or not makes no difference: if there is an @xml:lang, there of course is a first @xml:lang.
What happens if the document is stored? In eXist-db, the strange thing happens that the query does what I think you want it to do. It only picks out the fifth entry. This is why I wanted to clarify your query by paraphrasing it. I of course agree that the result is deterministic and that eXist-db may be wrong in the way it gets the "right" answer.
If you remove the @xml:lang on TEI you can see (by adding entries like the fifth after it) what eXist-db does: it takes the last of the entries if finds, that is, it somehow applies the ancestor axis to entry.
I think it does something which amounts to
$doc//tei:entry
[./tei:sense//text()
[contains(., 'aMSa')]
[ancestor::*[@xml:lang]]
][last()]
but perhaps @wolfgangmm has a clearer idea what goes on.
I think you are right: this is a bug.
Just to make things clear: regardless of the meaning of the query, I expect the original query to return 4 entry
elements out of 5. The bug is in the fact that it returns only 1.
Also, it is true that for this very case the presence or absence of [1]
should not make a difference, but for some strange reason, it does make a difference in current eXist implementation, leading to two different results. Similarly, the fact that a document is stored in a variable or read via doc()
should not make a difference in this case, but, again, it does.
Exactly what I attempted to write, @gioele.
I am having a similar issue. In my case a query returns results only if drop the indexes, adding the indexes makes the query return 0 results. However noticed one odd behaviour, this works :
for $bydatesell in $coll//trade[./scrip/transType['S' = .]]
...
but this fails :
for $bydatesell in $coll//trade[./scrip/transType[. = 'S']]
...
xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
let $test := document {
<TEI xmlns="http://www.tei-c.org/ns/1.0" version="5" xml:id="monier" xml:lang="en">
<text>
<entry xml:id="lemma-anaMSin" ana="H1">
<sense>an-aMSa</sense>
</entry>
<entry xml:id="lemma-apaBraMSa" ana="H1">
<sense ana="H1A">ungrammatical language</sense>
</entry>
<entry xml:id="lemma-apaSabda" ana="H1">
<sense> bad or vulgar speech apa-BraMSa</sense>
</entry>
<entry xml:id="lemma-AMhaspatya" ana="H1">
<sense>belonging to the dominion of</sense>
</entry>
<entry xml:id="lemma-AnuvaMSya" ana="H1">
<sense>(fr. <w xml:lang="san-Latn-x-SLP1">anu-vaMSa</w>), belonging
to a race</sense>
</entry>
<entry xml:id="lemma-AnuvaMSya-no-elem" ana="H1">
<sense>(fr. anu-vaMSa), belonging to a race</sense>
</entry>
</text>
</TEI>
}
return
$test//tei:entry
[./tei:sense//text()
[contains(., 'aMSa')]
[ancestor::*[@xml:lang][1]]
]
returns
<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="lemma-anaMSin" ana="H1">
<sense>an-aMSa</sense>
</entry>
<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="lemma-apaSabda" ana="H1">
<sense> bad or vulgar speech apa-BraMSa</sense>
</entry>
<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="lemma-AnuvaMSya" ana="H1">
<sense>(fr. <w xml:lang="san-Latn-x-SLP1">anu-vaMSa</w>), belonging
to a race</sense>
</entry>
<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="lemma-AnuvaMSya-no-elem" ana="H1">
<sense>(fr. anu-vaMSa), belonging to a race</sense>
</entry>
XPath queries that contain a
[1]
attribute will return only some of the nodes that they should return.For example, the following query will return only 1 item, the one with ID
lemma-AnuvaMSya
. Instead it should have returned 4 results, as confirmed by libxml and oXygen.A note on the query. This is a minimal test case reduction; in this particular case removing the
[1]
will lead to the expected behaviour, but removing the[1]
is not possible in the original query/environment from which this test case has been derived.text.xql
/db/dict/test.tei