QA spec: reading order evaluation

          Thanks for pointing me to that! Very interesting.

We should definitely mention this – if for the sake of completeness / commensurability.

I am surprised to find the actual formula slightly arbitrary, though. In trying to come up with a rate, the authors say:

it is favourable to calculate a relative error or success measure in the form of a percentage. This can be achieved by relating the error value to the highest possible error value. Due to the unconstrained nature of layout analysis results a definitive maximum cannot be determined. There is for instance no limit to the number of overlapping/stacked regions. Instead, a non-linear success function is used which has a parameter ($e_50$) representing an error value that corresponds to a success rate of 50%.

IMO it would be natural to use the share/number of pixels of each overlap area as weight. Then no such non-linear term would be necessary (the denominator would be the overall size of the page, times the sum of possible penalties)...

_Originally posted by @bertsky in https://github.com/OCR-D/spec/pull/225#discussion_r1106390674_

OCR-D / spec

QA spec: reading order evaluation #238