altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

Confidence value for Layout detection of elements #69

Open jukervin opened 3 years ago

jukervin commented 3 years ago

Would a confidence value of layout detection for elements like textblocks have any use? Most of the discussion about confidences is centered around text and characters. Is it assumed that textblocks, composedblocks, lines etc. are detected correctly or manually corrected?

artunit commented 3 years ago

We could add this as a discussion item for the upcoming Board meeting (20210-04-29). Layout has been a tricky area to navigate in ALTO vis-à-vis textblocks but it would seem to fit well into the notion of encoding the work of an OCR engine.