Open bertsky opened 4 years ago
It should be noted that this feature request is not academic at all. It is based on concrete necessities which arose in the context of “real“ OCR processing. I.e., we need this one!
Sorry, don't have the time to dive into this, but it seems to diverge quite a bit of what the idea of the PageContent format is. Maybe that could be covered by anyAttribute (see other issue). I haven't looked into anyAttribute much either, but it seems to be a fairly safe extension of the schema.
Maybe that could be covered by anyAttribute (see other issue)
That would be unadvisable for many reasons:
Page/@image*
) by applying them more consistently (also AlternativeImage
), whereas #19 brings new ideasit seems to diverge quite a bit of what the idea of the PageContent format is.
I couldn't disagree more :wink:
I second adding attributes to AlternativeImage
for scale and image-specific dimensions and pixel density. A more complete support for expressing general affine transformations or a dewarping grid would also be useful but I can understand @chris1010010's desire to limit the scope of the pagecontent model and this will need some more discussion. The proposal here is pretty concicse though, only addingg attributes with clear semantics. So please consider it for the upcoming release, it would really benefit @OCR-D processors.
Since
AlternativeImage
has been introduced on every level of the structural hierarchy, these image files can be used to represent results from image preprocessing (normalization, denoising, binarization, non-text suppression, despeckling, deskewing, dewarping). Some of these operations can and some cannot be represented descriptively – but referencing derived images always helps avoiding repeated computations.However, there's a difficulty/penalty involved: All coordinates in the PAGE hierarchy are referring to the original image (under
/PcGts/Page/@imageFilename
), whereas derived images (AlternativeImage/@filename
underPage
orRegion
orTextLine
orWord
) necessarily have different, local/relative coordinate system. It is connected to the global/absolute coordinate system only implicitly.So if you want to process via derived images, like crop segments further down the hierarchy (translating from their absolute coordinates to the images' relative coordinates) or add further segmentation (translating from new relative coordinates in the images to new absolute coordinates), then you must know the transformation between them.
This could merely be an offset (which could be unambiguously defined as the top left of the bounding box of the element's polygon), which happens after cropping (on the page level or any segmentation below that). But there are certain operations which change coordinates non-trivially:
All those effects are cumulative, i.e. they will compose into a new coordinate transform at each step, and in the order of the operations applied to the image (and its predecessors). This is not always trivial, e.g. cropping before/after deskewing, deskewing on page and then again on region level. It's certainly not rocket science, but (believe me) there are many ways you can get this wrong when you have to implement it.
Now, for cropping and deskewing, we are in the fortunate situation that – provided the operations applied on the derived image have been carried out in the "correct" way and documented in its
@comments
– their respective coordinate transform can be reconstructed from the descriptive information (Coords/@points
and@orientation
).But for dewarping and rescaling we don't even have any descriptive annotation yet.
For dewarping, maybe the dewarping schema with its
/DwGts/Grid/Row/@points
is sufficient (although it is unfortunate that this schema is external to the content schema).But for rescaling, there's nothing at all.
You could ask:
1: I'd be happy to see PAGE adopt some representation of affine transformations (basically a 3x3 float array) under
AlternativeImage/@coordinate-system
. But I would still consider this only a redundant convenience feature.2: Rescaling is useful under various scenarios:
Thus, I propose to at least introduce a descriptive annotation for derived images' scale factors:
AlternativeImage/@imageWidth
(as inPage/@imageWidth
)AlternativeImage/@imageHeight
(as inPage/@imageHeight
)AlternativeImage/@imageXResolution
(as inPage/@imageXResolution
)AlternativeImage/@imageYResolution
(as inPage/@imageYResolution
)AlternativeImage/@imageResolutionUnit
(as inPage/@imageResolutionUnit
)AlternativeImage/@imageXScale
(how much isAlternativeImage/@imageXResolution
zoomed overPage/@imageXResolution
?)AlternativeImage/@imageYScale
(how much isAlternativeImage/@imageYResolution
zoomed overPage/@imageYResolution
?)(Of course, the latter 2 are redundant, but pixel density might not be known exactly/reliably and thus omitted / set to zero. In that case, the scale can still describe precisely the factor between the unknown density of the original image and the unknown density of the derived image.)