Open kba opened 2 years ago
Yes, but how? As already formulated in OCR-D/core#264:
What we do need here is an API that allows getting a (polygon-masked, via alpha channels and/or bg fill) segment image via the same ad-hoc creation or AlternativeImage retrieval algorithm in the Python API, including filters and selectors. And more than that, output all information necessary for coordinate transformations, too.
Just to make this more clear: the difficulty here is in making the API calls (usually cascaded like
image_from_page → image_from_segment → image_from_segment
) re-entrant on the shell. For large memory objects like images, we can probably use (temporary) files. So, wrapping the file ID and segment ID to read and the image file name to write is no problem, as are the extra parameters. Even the parent image (file name) would be doable, but (returning and passing) the parent coords is hard, because it would have to be a single (re-useable) string that serializes all the information (transform
array,angle
float,features
string).
Current situation
When implementing processors using the bashlib shell library provided by OCR-D/core, developers have to write their own routines, based on tools like ImageMagick, to extract sections of images if the processor operates below page level or needs to take PrintSpace/Border into account.
This is inefficient, placing the burden of implementing coordinate arithmetics in shell on the developers. It also breaks DRY because we have many utility methods for coordinates and code to re-apply the operations that produced an AlternativeImage already in the Python API.
How it should be
We should provide command line tools to extract regions of images, based on coordinates and/or region/line ID.
Besides implementing the CLI this might also involve some refactoring to make methods available for easy re-use in a CLI, which would also benefit the Python API because it (potentially) decouples components for image processing from workspace handling.
CLI
Testing
MVP would be commands:
ocrd workspace extract --page-id "page123" --element-id="region123" --output region123.tiff
(to extract by line/region IDRelated