OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
119 stars 31 forks source link

AlternativeImage selection algorithm can't find the image #554

Closed Shanksum closed 4 years ago

Shanksum commented 4 years ago

AlternativeImage fails during a workflow-configuration call if a binarised image is needed. Error message:

Exception: Found no AlternativeImage that satisfies all requirements selector="binarized" in page "OCR-D-IMG-BIN_catalog46muse_0023-BIN_sauvola-ms-split"
Makefile:320: recipe for target 'OCR-D-IMG-BIN-DENOISE' failed
make[1]: *** [OCR-D-IMG-BIN-DENOISE] Error 1
make[1]: Leaving directory '/data/w1'
Makefile:205: recipe for target 'w1' failed
make: *** [w1] Error 2
make: Leaving directory '/data'

But PAGE.XML reads:

<pc:AlternativeImage filename="OCR-D-IMG-BIN/OCR-D-IMG-BIN_catalog46muse_0023-BIN_sauvola-ms-split.png" comments="binarized"/>
<pc:AlternativeImage filename="OCR-D-IMG-DESKEW/OCR-D-IMG-DESKEW_catalog46muse_0023.png" comments="binarized,deskewed"/>

OCR-D Version: 2.12.4

The used makefile for reproduction:

INPUT = OCR-D-IMG

$(INPUT):
    ocrd workspace find -G $@ --download
    ocrd workspace find -G OCR-D-IMG --download

BIN = $(INPUT)-BIN

$(BIN): $(INPUT)
$(BIN): TOOL = ocrd-olena-binarize
$(BIN): PARAMS = "impl": "sauvola-ms-split"

DEN = $(BIN)-DENOISE

$(DEN): $(BIN)
$(DEN): TOOL = ocrd-cis-ocropy-denoise
$(DEN): PARAMS = "level-of-operation": "page"

DES = $(DEN)-DESKEW

$(DES): $(DEN)
$(DES): TOOL = ocrd-cis-ocropy-deskew
$(DES): PARAMS = "level-of-operation": "page"

SEG = $(DES)-SEGMENT-REGION

$(SEG): $(DES)
$(SEG): TOOL = ocrd-tesserocr-segment-region

SER = $(SEG)-SEGMENT-REPAIR

$(SER): $(SEG)
$(SER): TOOL = ocrd-segment-repair
$(SER): PARAMS = "plausibilize": true

CBI = $(SER)-BIN-REGION

$(CBI): $(SER)
$(CBI): TOOL = ocrd-cis-ocropy-binarize
$(CBI): PARAMS = "level-of-operation": "region"

TED = $(CBI)-TESSER-DESKEW

$(TED): $(CBI)
$(TED): TOOL = ocrd-tesserocr-deskew

TES = $(TED)-TESSER-SEGMENT

$(TES): $(TED)
$(TES): TOOL = ocrd-tesserocr-segment-line

RES = $(TES)-RESEGMENT

$(RES): $(TES)
$(RES): TOOL = ocrd-cis-ocropy-resegment

DEW = $(RES)-RESEGMENT

$(DEW): $(RES)
$(DEW): TOOL = ocrd-cis-ocropy-dewarp

# OUTPUT = $(DEW)-OUT

.DEFAULT_GOAL = $(DEW)

# Down here, custom configuration ends.
###

include Makefile
hnesk commented 4 years ago

The problem disappears, when I change the line:

BIN = $(INPUT)-BIN

to anything else, like

BIN = $(INPUT)-BIN-olena

I suppose the problem is, that the PAGE xmls and images will get stored in the same folder (OCR-D-IMG-BIN), because of the default file group of olena:

14:02:20.015 INFO ocrd-olena-binarize - No output file group for images specified, falling back to 'OCR-D-IMG-BIN'
kba commented 4 years ago

14:02:20.015 INFO ocrd-olena-binarize - No output file group for images specified, falling back to 'OCR-D-IMG-BIN'

Yes, that has been fixed in the 1.2.0 version I released earlier today. Can you upgrade ocrd_olena and see if the problem persists (with the non-workaround syntax).

kba commented 4 years ago

I can still reproduce the issue. Sleuthing continues.

bertsky commented 4 years ago

@Shanksum I cannot reproduce this. Can you please retry with the current version, or close the issue?

Shanksum commented 4 years ago

Because the issue was reproducable before, I consider this issue solved.