altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

Clarify implicit reading order #68

Closed mittagessen closed 2 years ago

mittagessen commented 3 years ago

There are currently no mentions of reading order anywhere in the standard and most people treat the sequence of elements as the order these elements should be read, e.g. the n-th <String> in a <TextLine> is the n-th word a human reader would read in that line.

Apparently this isn't evident to everyone out there. These tweets document that Transkribus's ALTO output sorts <String> elements from left to right which causes an inversion for RTL text. We should probably clarify that <TextLine>/<String>/<Glyph> are to be ordered in a way that corresponds to the text flow.

cneud commented 3 years ago

Thank you Ben for chiming in and raising the issue here! I saw the tweets but you beat me to it ;)

There was some discussion on making this more explicit previously, the outcomes of which are captured here https://github.com/altoxml/schema/issues/12#issuecomment-113184844, but eventually no changes to the schema were made. Maybe now is a good time to revisit this.

mittagessen commented 3 years ago

Yes, I'm aware of previous discussions regarding reading order but these require a larger overhaul of the object ordering notation as quite a few documents would need multiple reading orders and other 'advanced' features.

This is mostly about putting an explanatory note in some supplementary material that textual elements are not some amorphous cloud but their sequence represents a valid text flow, i.e. clarifying how all but one software currently serializes into ALTO. No actual changes to the schema needed.

artunit commented 3 years ago

Adding a link to issue 69 (Confidence value for Layout detection of elements) here.

cipriandinu commented 2 years ago

ACCEPT

cneud commented 2 years ago

ACCEPT

artunit commented 2 years ago

ACCEPT

cowboyMontana commented 2 years ago

ACCEPT

callylaw commented 2 years ago

ACCEPT

Haighton commented 2 years ago

ACCEPT

cowboyMontana commented 2 years ago

ACCEPT

Ra1phM commented 2 years ago

ACCEPT

ntra00 commented 2 years ago

ACCEPT

JLoitzenbauer-CRKN commented 2 years ago

ACCEPT

c-sebastien commented 2 years ago

ACCEPT

bkgeig commented 2 years ago

ACCEPT

rajubln commented 2 years ago

ACCEPT

acpopat commented 2 years ago

ACCEPT

cipriandinu commented 2 years ago

Published in v4.3