chained workflows - Githubissues

It would help if workflows can be chained at runtime, e.g. ocrd-make -f pre3.mk -f seg1.mk -f ocr4.mk -f post.mk, where each makefile would consume the last fileGrp of the previous – so each stage can be replaced by an alternative configuration independent of the others. This in turn would allow writing very concise small (sub-)configurations without repetition.

As for implementation, make allows passing multiple makefiles and reads them sequentially (w.r.t. first phase, i.e. expansion of immediate variables etc.), then combines them (second phase) and finally computes dependencies.

So we could by convention (for chainable configurations) allow defining a simply expanded variable (say) OUTPUT for the (phase's) output fileGrp name, and allow using INPUT for the (phase's) dynamic input fileGrp name. Internally then (i.e. in our Makefile that always needs to be included), we predefine INPUT := $(or $(OUTPUT),$(INPUT)) and .DEFAULT_GOAL := $(OUTPUT). For the very first phase (entry point), we then just have to pass INPUT – either in a separate (phase zero) non-rule config file or with an additional cmdline arg.

For example

pre3.mk


BIN: $(INPUT)
BIN: TOOL = ocrd-doxa-binarize

DESK: BIN DESK: TOOL = ocrd-cis-ocropy-deskew DESK: PARAMS = "level-of-operation": "page"

CROP: DESK CROP: TOOL = ocrd-anybaseocr-crop CROP: PARAMS = "rulerAreaMax": 0

OUTPUT := CROP

* seg1.mk
```make
SEG: $(INPUT)
SEG: TOOL = ocrd-kraken-segment
SEG: PARAMS = "model": "blla.mlmodel"

RESEG: SEG
RESEG: TOOL = ocrd-cis-ocropy-resegment
RESEG: PARAMS = "method": "baseline"

OUTPUT := RESEG

ocr4.mk


OCR1: $(INPUT)
OCR2: $(INPUT)
OCR3: $(INPUT)
OCR1 OCR2 OCR3: OPTIONS = -P textequiv_level glyph

OCR1: TOOL = ocrd-tesserocr-recognize OCR1: OPTIONS += -P model frak2021+deu

OCR2: TOOL = ocrd-calamari-recognize OCR2: OPTIONS += -P checkpoint_dir qurator-gt4histocr-1.0

OCR3: TOOL = ocrd-kraken-recognize OCR3: OPTIONS += -P model austriannewspapers.mlmodel

MULTI: OCR1 OCR2 OCR3 MULTI: TOOL = ocrd-cor-asv-ann-align MULTI: PARAMS = "method": "combined"

OUTPUT := MULTI

* post.mk
```make
ALTO: $(INPUT)
ALTO: TOOL = ocrd-fileformat-transform
ALTO: OPTIONS = -P from-to "page alto" -P script-args "--no-check-border --dummy-word"

OUTPUT := ALTO

in preinstalled Makefile

override INPUT := $(or $(OUTPUT),$(INPUT))
.DEFAULT_GOAL := $(OUTPUT)
...

running

make -f pre3.mk -f seg1.mk -f ocr4.mk -f post.mk INPUT=ORIGINAL

Since this only requires these 2 additional lines and does not break existing makefiles, this is more of a documentation issue actually. (And probably, the old makefiles should be removed or updated or split into multi-stage configurations anyway.)

@mikegerber would that fit your need as well?

bertsky / workflow-configuration

chained workflows #25