OCR-D / page-to-alto

Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
Apache License 2.0
13 stars 5 forks source link

support ALTO 4.3 #30

Open bertsky opened 2 years ago

bertsky commented 2 years ago

New features:

  1. Add BASEDIRECTION attribute defining base direction and line orientation to TextLine and BlockType.
  2. Add support for explicit reading order definitions with "ReadingOrder" element containing "UnorderedGroup"s, "OrderedGroup"s, and "ElementRef"s.

Regarding @BASEDIRECTION the docs state:

Describes the inline base direction and line orientation of a line or of all lines inside a text block. The meaning of these terms is defined by the W3C writing modes document These values should correspond to the base direction set in the BiDi algorithm to the respective elements during Unicode encoding. A value of "ttb" (top-to-bottom) implies a base direction of left-to-right, a value of "btt" (bottom-to-top) a base direction of right-to-left.

  • ltr
  • rtl
  • ttb
  • btt

It sounds a lot like @readingDirection in PAGE, but there is no mention of bidirectionality here. @chris1010010, can you help?

As to ReadingOrder, that has been directly adopted from PAGE, with subtle differences though: