OCR-D / zenhub

Repo for developing zenhub integration
Apache License 2.0
0 stars 0 forks source link

Expose image extraction code to bashlib #34

Open kba opened 2 years ago

kba commented 2 years ago

Current situation

When implementing processors using the bashlib shell library provided by OCR-D/core, developers have to write their own routines, based on tools like ImageMagick, to extract sections of images if the processor operates below page level or needs to take PrintSpace/Border into account.

This is inefficient, placing the burden of implementing coordinate arithmetics in shell on the developers. It also breaks DRY because we have many utility methods for coordinates and code to re-apply the operations that produced an AlternativeImage already in the Python API.

How it should be

We should provide command line tools to extract regions of images, based on coordinates and/or region/line ID.

Besides implementing the CLI this might also involve some refactoring to make methods available for easy re-use in a CLI, which would also benefit the Python API because it (potentially) decouples components for image processing from workspace handling.

CLI

ocrd workspace extract [OPTIONS]

Options:
  -- output       filename of output image, "-" for STDOUT. [default: "-"]
  --page-id       ID of the PAGE to operate on [required]
  --element-id    ID of the region or line to extract
  --coords          Coordinates of the polygon to cut from the image
  --features-select    Which features an AlternativeImage must include
  --features-filter      Which features an AlternativeImage must NOT include

Testing

MVP would be commands:

Related

bertsky commented 2 years ago

Yes, but how? As already formulated in OCR-D/core#264:

What we do need here is an API that allows getting a (polygon-masked, via alpha channels and/or bg fill) segment image via the same ad-hoc creation or AlternativeImage retrieval algorithm in the Python API, including filters and selectors. And more than that, output all information necessary for coordinate transformations, too.

Just to make this more clear: the difficulty here is in making the API calls (usually cascaded like image_from_page → image_from_segment → image_from_segment) re-entrant on the shell. For large memory objects like images, we can probably use (temporary) files. So, wrapping the file ID and segment ID to read and the image file name to write is no problem, as are the extra parameters. Even the parent image (file name) would be doable, but (returning and passing) the parent coords is hard, because it would have to be a single (re-useable) string that serializes all the information (transform array, angle float, features string).