eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
421 stars 179 forks source link

[BUG] wrong text parsing inside a xml content starting by XQdoc prefix #5239

Open gmella opened 6 months ago

gmella commented 6 months ago

Text content is ignored building text content inside an xml fragment

<xml>(:~....:)</xml>

and throws an error if the XQdoc like string is not closed

<xml>(:~....</xml>

I expect to get only one len element in the output of next code:

let $es := (
    <e>AAA(:~</e>,
    <e>(:~A:)</e>,
    <e>(:~:)A</e>,
    <e>(:~AAA</e>
    )

let $xml := <xml>{
    for $e in $es
        group by $len:=string-length($e)
        return <len size="{$len}">{for $s in $e return $s}</len>
    }</xml>

return 
    if(count($xml//len)>1) then
        error( QName('exist', 'test'), "all e elements do not have the same text length" )
    else 
        $xml

Tested on existdb 6.2.0 & 7.0.0-SNAPSHOT

joewiz commented 6 months ago

Confirmed with 6.2.0 and 7.0.0-SNAPSHOT. There's no output in exist.log. A further simplified query:

xquery version "3.1";

<foo>(:~</foo>

... returns the following error in the Java Admin Client:

An exception occurred during query execution: exerr:ERROR exerr:ERROR expecting ':', found 'o' [at line 3, column 13] [at line 3, column 13]

Changing any aspect of the string (:~ (e.g., deleting one of the characters or inserting a character between these) makes the error go away. So it's something about this sequence of characters that triggers the error.

Using BaseX and Saxon, the same query returns the expected results - <foo>(:~</foo> with no error.

@gmella Your FLWOR returns more than 1 <len> element in BaseX and Saxon, so if you were expecting only one, you might further review that query. But I think your unexpected result there is not related to the core issue that I've focused on above.

gmella commented 6 months ago

Thank you for the confirmation and label. I will try to review the query since I recognize wrong inputs... It seems that the markdown renderer do also consider that your foo closing tag is a comment ;)