reader ignores index in ordered groups

AFAICS, the existing implementations for all versions of PAGE-XML ignore (OrderedGroup|OrderedGroupIndexed)/@index when parsing the XML.

This is how it looks:

https://github.com/PRImA-Research-Lab/prima-core-libs/blob/1f087a4378f58a34c83176ab0ffb620dd8b78f2d/java/PrimaDla/src/org/primaresearch/dla/page/io/xml/sax/SaxPageHandler_2019_07_15.java#L335-L342

References for ATTR_index are nowhere to be found.

The model class of the group in turn does nothing on its part to check incoming indices, it simply appends them:

https://github.com/PRImA-Research-Lab/prima-core-libs/blob/1f087a4378f58a34c83176ab0ffb620dd8b78f2d/java/PrimaDla/src/org/primaresearch/dla/page/layout/logical/Group.java#L193-L199

This means that applications like PageViewer or PageConverter will use the XML order instead of the actual order laid out by the schema semantics. Which in turn creates a problem for applications like OCR-D: What is the correct representation, the one shown by PageViewer or my strict implementation?

Here's an example of the difference this can make:

PAGE-XML and original image: debug-readingorder.zip
rendered by PageViewer:
rendered by ocrd-segment-extract-pages:

In sharp contrast to what one might suspect superficially, here it's PageViewer who gets the order wrong – along with the producing tool eynollah (which follows its model of just looking at the XML order), hence a compensatory error.

If my interpretation is wrong, please get back to me soonish for confirmation. (I don't care about the fix so much as clarity on the correct meaning of the standard for implementation in software and adoption in derived specifications like OCR-D.)

If the better place is the PAGE-XML repo, please transfer.

PRImA-Research-Lab / prima-core-libs

reader ignores index in ordered groups #13