Closed wrznr closed 5 years ago
Indeed. Also applies to later processing steps, i.e. image manipulation steps operating on regions and lines. The only way to reference these additional image files is via AlternativeImage
from some PAGE. So all later steps must query these before (and instead of) they SetRectangle
on the element's coords: Not just Crop/SegmentRegion on PageType, but also SegmentLine on TextRegionType, SegmentWord on TextLineType, and Recognize on its textequiv_level
element.
If there is ambiguity (multiple alternative images available), maybe we should define rules to choose? We already have rules for comments
. Now we have to specify which comments are preferable/expected at which step.
@wrznr and I have given this some thought:
There are preprocessing steps that must create new image data (because there is no other way to represent their result), like despeckling, dewarping and binarization. There are also steps that can, but could also just annotate the PAGE with enough information for later steps to apply them, e.g. deskewing (via @orientation
) and cropping (via Coords/@points
). And sometimes, that depends on the hierarchy level: e.g. deskewing angle can only be annotated on TextRegion
(as TextLine
and Page
have no @orientation
).
But whatever the level, when descending to a lower level, all the annotated image preprocessing should be applied, because otherwise it would have to be repeated in all the constituent elements during the next step.
Therefore, while generally it is for the processor to decide whether or not to create new image data, at the last step per level (typically binarization) it must be configured to do so. And every processor must be programmed to respect image data (AlternativeImage
) for its respective level (or higher in the hierarchy) if referenced in the input PAGE. Since each step produces a new PAGE from the old one, there is no (valid use-case of) ambiguity – one can always take the last AlternativeImage
(and the @comments
are purely cosmetic).
So PAGE+METS allows a very flexible generic workflow design. However, there is a subtelty in coordinate calculations involved here: Since PointsType
(anywhere from BorderType
down to any segment's CoordsType
) is required to be relative to the root PageType/@imageFilename
image, but AlternativeImage
generally does not retain coordinates, one cannot simply derive lower-level image data for an element by cutting the parent image in the hierarchy at its coordinates. Instead, each implicit coordinate transform must be explicitly passed down along with AlternativeImage
so it can be compensated in lower-level coordinate calculations.
And obviously, this would be difficult to do (and even more difficult to annotate) with non-linear transforms like dewarping. It is easier to live with that if dewarping is done on the line level (when only vertical coordinates will be off for words and glyphs) than on the page level.
But for linear transforms this can be done easily:
AlternativeImage
is larger than annotated, as happens during deskewing/rotation because the image has to be expanded/reshaped, then decrease the offset x / y by half the difference in width / height)I have implemented this for ocropy first. Functions in ocrd_cis.ocropy.common
like image_from_page
, image_from_region
, image_from_line
and save_image_file
should probably be moved into ocrd.workspace.Workspace
and recommended for all processors. But before that I want to re-integrate this architecture here and see if the solution is general enough...
@kba @chreul What do you think?
With permission from @wrznr I add this general workflow diagram for illustration of preprocessing options.
So, to rephrase the "subtelty": we have a principle at work here which states that coordinates within any AlternativeImage
(on whatever level) must be reproducible, i.e. the annotation present in an element that contains AlternativeImage
and upwards the hierarchy must always be sufficient to calculate the pixel position in the image from the pixel position in PageType/@imageFilename
(e.g. when cropping components further down the hierarchy) or vice versa (e.g. when adding elements further down the hierarchy).
This reproducibility priniple is currently jeopardized (in concept) by two problems:
Now, as for 1, we could try to define a parametric field equivalent (within reasonable accuracy) to any conceivable binary dewarping transform. For example, let's assume the Leptonica approach has sufficient generality. It defines the transform as a vertical and horizontal disparity field, which is basically a (quadratic) parametric function of points interpolated between equidistant intervals. This can be described as two vectors each.
So all we need is an attribute in PAGE for this, and consumers willing to perform the compensatory calculations on all coordinates after and below dewarping. We could of course use @custom
again.
Or could we perhaps use GridType
for this?
Regarding problem 2, we now face the problem that a difference between actual binary size of AlternativeImage
and size of the Coords/@points
rectangle of the element can be caused by either rotation and rescaling, which amounts to either offset correction or scaling back. So we need a way to disambiguate this. Using @comments
to check if either is the case is not sufficent though: it could also be both! So again, we either resort to @custom
, or we need a new attribute in PAGE, let's say AlternativeImage/@scale
with xsl:attribute/@default="1.0"
. A third option would be to prohibit rescaling as a valid use case altogether.
@chris1010010 what do you think? (BTW, introcuding @orientation
on the page level can also be seen as a means to ensure the AlternativeImage
reproducibility principle.)
BTW, @cneud how does ALTO deal with this? Is ComposedBlockType
the equivalent of AlternativeImage
, or how do you specify binary image data (interim) results below the page level?
Oh, while we are at it: there are two more points which might need disambiguation:
A. If a region (or page) has non-zero @orientation
and an AlternativeImage
is present, do we expect the image to be deskewed already, or do processors always produce a @comments
string with the correct classification, i.e. including deskewed
?
B. If a page has Border
and an AlternativeImage
is present, do we expect the image to be cropped already, or do processors always produce a @comments
string with the correct classification, i.e. including cropped
?
According to the OCR-D functional model, binarization can take place prior to block and line segmentation. Both processing steps should use the alternative image (if present).