POETSII / Orchestrator

The Orchestrator is the configuration and run-time management system for POETS platforms.
1 stars 1 forks source link

XML parser can't handle multiline attributes #223

Closed m8pple closed 2 years ago

m8pple commented 3 years ago

This is maybe a bit obscure, but it was in one of the test cases developed for v4 parsing. It's also a bit related to the constraints created by the requirement that Metadata become a key value pair, both of which must be in attributes (an alternative proposed was that the value could be in the content, but this was rejected as too complicated).

The test-case anticipates trying to put a large value in a metadata value, which is fairly likely (as some meta-data values are quite bit in existing v3 files):

https://github.com/POETSII/poets_improvement_proposals/blob/2a355282f3aa3912bdacf4ec50bf25199bdf310d/proposed/PIP-0020/xml/ic/tests/valid/L3-compilation/tiniest-plus-metadata3.xml#L3-L9

This is valid XML, though the parser is allowed to normalise the whitespace to spaces. However, it fails in the orchestrator parser:


#Validation of /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml Failed.
# plog is in ../Output/Microlog/Microlog_2021_05_27T08_47_33p5.plog

# > ========================================================================================================================
# > 27/05/2021 08:47:33.09 file ../Output/Microlog/Microlog_2021_05_27T08_47_33p5.plog
# > command [load /app = "/mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml"]
# > from console
# > ========================================================================================================================
# > 
# > ------------------------------------------------------------------------------------------------------------------------
# > 27/05/2021 08:47:33.09
# > 
# > Checking client file /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml
# > 
# > (lin,col) refers to element closure location in client file
# > 
# > Syntax analysis...
# > 
# > 
# > ==========================================================
# > XML syntax error
# > Error code             : (2) Unexpected token encountered
# > Line, col              : 6,9
# > (Last) symbol token    : &
# > Lexer token history....
# > (Line,Col)
# > (   6,  9) &      <- Token out of sequence
# > (   5,  9) n
# > (   4, 40) n
# > (   4, 39) gr
# > (   4, 36) =
# > (   4, 35) value
# > (   4, 29) grarr
# > (   4, 22) =
# > (   4, 21) key
# > (   4, 17) Metadata
# > 
# > ==========================================================
# > 
# > ==========================================================
# > XML syntax error
# > Error code             : (2) Unexpected token encountered
# > Line, col              : 6,11
# > (Last) symbol token    : lt
# > Lexer token history....
# > (Line,Col)
# > (   6, 11) lt      <- Token out of sequence
# > (   6,  9) &
# > (   5,  9) n
# > (   4, 40) n
# > (   4, 39) gr
# > (   4, 36) =
# > (   4, 35) value
# > (   4, 29) grarr
# > (   4, 22) =
# > (   4, 21) key
# > 
# > ==========================================================
# > 
# > ==========================================================
# > XML syntax error
# > Error code             : (2) Unexpected token encountered
# > Line, col              : 6,12
# > (Last) symbol token    : ;
# > Lexer token history....
# > (Line,Col)
# > (   6, 12) ;      <- Token out of sequence
# > (   6, 11) lt
# > (   6,  9) &
# > (   5,  9) n
# > (   4, 40) n
# > (   4, 39) gr
# > (   4, 36) =
# > (   4, 35) value
# > (   4, 29) grarr
# > (   4, 22) =
# > 
# > ==========================================================
# > 
# > 
# > Errors: 3. Best effort client tree after recovery:
# > 
# > XML file /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml
# > (   2,  1)[E] /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml
# > 
# > (   2, 97)+-[E] /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml.Graphs
# > 
# > (   2, 97)  |== [A] xmlns = "https://poets-project.org/schemas/virtual-graph-schema-v4"
# > (   2, 97)  |== [A] formatMinorVersion = "0"
# > (   6, 12)  +-[E] /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml.Graphs.GraphType
# > 
# > (   6, 12)     == [A] id = "tiniest"
# > 
# > XML file /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml
# > 6 source lines
# > 2 XML elements
# > 0 comments
# > 0 C fragments
# > 
# > ...Client file /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml
# > exhibits 3 syntax errors in 93 msecs
# > 
# > ........................................................................................................................
# > 27/05/2021 08:47:33.09
# > 
# > Checking client file /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml
# > Semantic checking suppressed
# > 
# > ...Client file /mnt/e/dt10_all/POETS/Orchestrator/Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml
# > exhibits 3 accumulated (syntax/structure) errors in 93 msecs
# > 
# > ------------------------------------------------------------------------------------------------------------------------
# > 
not ok 14 - Parse Tests/ReferenceXML/v4/PEP20/tests/valid/L3-compilation/tiniest-plus-metadata3.xml

I can't post the exact version of the orchestrator right now, as I have pending changes to fix #222, #221, and #220 which I don't want to push yet. However, the problem with multi-line attributes must exist in b5615c33a5476c4ed1d1170cbc1abaa4201ad007, as that is the last version I merged, and I haven't touched the XML parser code (only the grammar).

This is probably not that high priority, as it can be worked around, but means that some of the existing test-cases won't pass.

DrongoTheDog commented 2 years ago

This is a deceptively big deal. At the heart of all this is Lex, which has been in Generics for probably 30 years. Lex accepts c/r as a valid alternative to '"' for closing a quoted string, and I am loathe to change stuff in Generics because the repercussions will ripple out in strange ways. What we would do is to put (yet another) switch into the Lex interface that leaves the behaviour as is by default and allows us a special POETS behaviour that insists on double quotes on either side. This will let is include '\n' as a character inside a string, which I think is the behaviour you want. (But why?)

m8pple commented 2 years ago

This issue doesn't really matter at a practical level. XML compliance is not a goal, and Xml generated from other XML languages and generators can be normalised (e.g. collapsing to spaces) if needed.