support scale attribute for down/upsampled images

bertsky commented 4 years ago

Since AlternativeImage has been introduced on every level of the structural hierarchy, these image files can be used to represent results from image preprocessing (normalization, denoising, binarization, non-text suppression, despeckling, deskewing, dewarping). Some of these operations can and some cannot be represented descriptively – but referencing derived images always helps avoiding repeated computations.

However, there's a difficulty/penalty involved: All coordinates in the PAGE hierarchy are referring to the original image (under /PcGts/Page/@imageFilename), whereas derived images (AlternativeImage/@filename under Page or Region or TextLine or Word) necessarily have different, local/relative coordinate system. It is connected to the global/absolute coordinate system only implicitly.

So if you want to process via derived images, like crop segments further down the hierarchy (translating from their absolute coordinates to the images' relative coordinates) or add further segmentation (translating from new relative coordinates in the images to new absolute coordinates), then you must know the transformation between them.

This could merely be an offset (which could be unambiguously defined as the top left of the bounding box of the element's polygon), which happens after cropping (on the page level or any segmentation below that). But there are certain operations which change coordinates non-trivially:

Deskewing will shift to the center of the element's bounding box, then rotate around that center, increasing the size of the bounding box (to avoid loosing content at the corners), and shifting back to the (new) top left of the bounding box. Alternatively, larger angles (e.g. multiples of 90°) could be applied by reflection instead of rotation.
Dewarping may change coordinates in any number of ways (3d shear or cubic spline projection, or interpolated raster grid, including as a special case centerline projection).
Rescaling or aspect correction will multiply coordinates by a constant factor.

All those effects are cumulative, i.e. they will compose into a new coordinate transform at each step, and in the order of the operations applied to the image (and its predecessors). This is not always trivial, e.g. cropping before/after deskewing, deskewing on page and then again on region level. It's certainly not rocket science, but (believe me) there are many ways you can get this wrong when you have to implement it.

Now, for cropping and deskewing, we are in the fortunate situation that – provided the operations applied on the derived image have been carried out in the "correct" way and documented in its @comments – their respective coordinate transform can be reconstructed from the descriptive information (Coords/@points and @orientation).

But for dewarping and rescaling we don't even have any descriptive annotation yet.

For dewarping, maybe the dewarping schema with its /DwGts/Grid/Row/@points is sufficient (although it is unfortunate that this schema is external to the content schema).

But for rescaling, there's nothing at all.

You could ask:

shouldn't we then allow annotating the coordinate transform explicitly?
why do you want to rescale?

1: I'd be happy to see PAGE adopt some representation of affine transformations (basically a 3x3 float array) under AlternativeImage/@coordinate-system. But I would still consider this only a redundant convenience feature.

2: Rescaling is useful under various scenarios:

avoid wasting computation on images with too large pixel density by downsampling them during processing
ensuring a fixed pixel density for operations that expect certain component sizes or distances (e.g. rule-based segmentation tools always assuming 300 DPI)
ensuring a fixed pixel resolution for operations that expect a certain image size (e.g. neural segmentation tools)
ensuring a fixed width/height aspect ratio during processing

Thus, I propose to at least introduce a descriptive annotation for derived images' scale factors:

AlternativeImage/@imageWidth (as in Page/@imageWidth)
AlternativeImage/@imageHeight (as in Page/@imageHeight)
AlternativeImage/@imageXResolution (as in Page/@imageXResolution)
AlternativeImage/@imageYResolution (as in Page/@imageYResolution)
AlternativeImage/@imageResolutionUnit (as in Page/@imageResolutionUnit)
AlternativeImage/@imageXScale (how much is AlternativeImage/@imageXResolution zoomed over Page/@imageXResolution?)
AlternativeImage/@imageYScale (how much is AlternativeImage/@imageYResolution zoomed over Page/@imageYResolution?)

(Of course, the latter 2 are redundant, but pixel density might not be known exactly/reliably and thus omitted / set to zero. In that case, the scale can still describe precisely the factor between the unknown density of the original image and the unknown density of the derived image.)

wrznr commented 4 years ago

It should be noted that this feature request is not academic at all. It is based on concrete necessities which arose in the context of “real“ OCR processing. I.e., we need this one!

chris1010010 commented 4 years ago

Sorry, don't have the time to dive into this, but it seems to diverge quite a bit of what the idea of the PageContent format is. Maybe that could be covered by anyAttribute (see other issue). I haven't looked into anyAttribute much either, but it seems to be a fairly safe extension of the schema.

bertsky commented 4 years ago

Maybe that could be covered by anyAttribute (see other issue)

That would be unadvisable for many reasons:

this proposal merely adds attributes, whereas #19 deals with elements
this proposal completes previous amendments (Page/@image*) by applying them more consistently (also AlternativeImage), whereas #19 brings new ideas
this proposal helps integrating applications/implementations via PAGE with well-defined semantics, whereas #19 can only mandate individual (module-local) extensions

it seems to diverge quite a bit of what the idea of the PageContent format is.

I couldn't disagree more :wink:

kba commented 4 years ago

I second adding attributes to AlternativeImage for scale and image-specific dimensions and pixel density. A more complete support for expressing general affine transformations or a dewarping grid would also be useful but I can understand @chris1010010's desire to limit the scope of the pagecontent model and this will need some more discussion. The proposal here is pretty concicse though, only addingg attributes with clear semantics. So please consider it for the upcoming release, it would really benefit @OCR-D processors.

PRImA-Research-Lab / PAGE-XML

support scale attribute for down/upsampled images #25