It would help if workflows can be chained at runtime, e.g. ocrd-make -f pre3.mk -f seg1.mk -f ocr4.mk -f post.mk, where each makefile would consume the last fileGrp of the previous – so each stage can be replaced by an alternative configuration independent of the others. This in turn would allow writing very concise small (sub-)configurations without repetition.
As for implementation, make allows passing multiple makefiles and reads them sequentially (w.r.t. first phase, i.e. expansion of immediate variables etc.), then combines them (second phase) and finally computes dependencies.
So we could by convention (for chainable configurations) allow defining a simply expanded variable (say) OUTPUT for the (phase's) output fileGrp name, and allow using INPUT for the (phase's) dynamic input fileGrp name. Internally then (i.e. in our Makefile that always needs to be included), we predefine INPUT := $(or $(OUTPUT),$(INPUT)) and .DEFAULT_GOAL := $(OUTPUT). For the very first phase (entry point), we then just have to pass INPUT – either in a separate (phase zero) non-rule config file or with an additional cmdline arg.
make -f pre3.mk -f seg1.mk -f ocr4.mk -f post.mk INPUT=ORIGINAL
Since this only requires these 2 additional lines and does not break existing makefiles, this is more of a documentation issue actually. (And probably, the old makefiles should be removed or updated or split into multi-stage configurations anyway.)
It would help if workflows can be chained at runtime, e.g.
ocrd-make -f pre3.mk -f seg1.mk -f ocr4.mk -f post.mk
, where each makefile would consume the last fileGrp of the previous – so each stage can be replaced by an alternative configuration independent of the others. This in turn would allow writing very concise small (sub-)configurations without repetition.As for implementation,
make
allows passing multiple makefiles and reads them sequentially (w.r.t. first phase, i.e. expansion of immediate variables etc.), then combines them (second phase) and finally computes dependencies.So we could by convention (for chainable configurations) allow defining a simply expanded variable (say)
OUTPUT
for the (phase's) output fileGrp name, and allow usingINPUT
for the (phase's) dynamic input fileGrp name. Internally then (i.e. in ourMakefile
that always needs to be included), we predefineINPUT := $(or $(OUTPUT),$(INPUT))
and.DEFAULT_GOAL := $(OUTPUT)
. For the very first phase (entry point), we then just have to passINPUT
– either in a separate (phase zero) non-rule config file or with an additional cmdline arg.For example
DESK: BIN DESK: TOOL = ocrd-cis-ocropy-deskew DESK: PARAMS = "level-of-operation": "page"
CROP: DESK CROP: TOOL = ocrd-anybaseocr-crop CROP: PARAMS = "rulerAreaMax": 0
OUTPUT := CROP
OCR1: TOOL = ocrd-tesserocr-recognize OCR1: OPTIONS += -P model frak2021+deu
OCR2: TOOL = ocrd-calamari-recognize OCR2: OPTIONS += -P checkpoint_dir qurator-gt4histocr-1.0
OCR3: TOOL = ocrd-kraken-recognize OCR3: OPTIONS += -P model austriannewspapers.mlmodel
MULTI: OCR1 OCR2 OCR3 MULTI: TOOL = ocrd-cor-asv-ann-align MULTI: PARAMS = "method": "combined"
OUTPUT := MULTI
Since this only requires these 2 additional lines and does not break existing makefiles, this is more of a documentation issue actually. (And probably, the old makefiles should be removed or updated or split into multi-stage configurations anyway.)
@mikegerber would that fit your need as well?