bertsky / workflow-configuration

a makefilization for OCR-D workflows, with configuration examples
Apache License 2.0
9 stars 4 forks source link

Multiple input groups #2

Closed mikegerber closed 4 years ago

mikegerber commented 4 years ago

Hi Robert,

I wasn't sure if/how to submit this, so I thought to discuss it first in an issue.

$(OUTPUT): $(OCR1) $(OCR2) $(OCR3) $(OCR4) $(OCR5) $(OCR6) $(OCR7) $(OCR8)
# must be last to become first:
$(OUTPUT): $(INPUT)
        ocrd-cor-asv-ann-evaluate -I $(call concatcomma,$^)

comma = ,
concatcomma = $(subst $() $(),$(comma),$(1))

Could the concatcomma call be included in the toolrecipe instead of including it in every *.mk? I use almost the same solution as yours ($(subst $(space),$(comma),$^) in the toolrecipe definition) and in my eval.mk just this:

OCR-FOO-EVAL: OCR-D-GT-PAGE OCR-FOO
OCR-FOO-EVAL: TOOL = ocrd-dinglehopper
# (I had to simplify because I'm actually generating this using a template,
# so this exact rule is untested)

The subst is a NOP if $^ is just one group.

bertsky commented 4 years ago

@mikegerber You are absolutely right. This simplifies configurations and thus makes them more comprehensible and maintainable. It should also be general enough: If we do get in/out file group validation relative to the tool json, then accidentally adding or forgetting prerequisites should give an error right away.

The one thing about this solution which does worry me is that make cannot guarantee the ordering of prerequisites, and tools like evaluation depend on at least the first being GT. In my solution, this is enforced, because the INPUT prereq is in the rule where the recipe is. But how do you ensure that in your solution (especially if running in parallel)?

bertsky commented 4 years ago

I guess if we don't want to rely on certain naming conventions (like *-GT-*) then all we can do is artificially ordering by dependency depth. Thus file groups which require more steps will end up later in the list. And GT (whatever it is called) should always come first...

bertsky commented 4 years ago

...something like:

define nprereqs
$(shell echo -n $(1); LC_MESSAGES=C $(MAKE) -R -nd $(1) 2>&1 | \
        fgrep -e 'Considering target file' -e 'Trying rule prerequisite' | \
        cut -d\' -f2 | tr ' ' '\n' | wc -l)
endef
# get a list (1) of targets, compute their dependency depth,
# return the list sorted by increasing depth
define sortnprereqs
$(shell $(foreach target,$(call nprereqs,$(target)),$1) | sort -n -k2)
endef
bertsky commented 4 years ago

On the other hand, writing the GT group first ... just works (for now). Even when running in parallel (-j).

@mikegerber What do you think: Do we accept the risk of some day under some circumstances facing a make implementation that garbles the ordering (crossing fingers), or do we make this much more complicated (and still based on implicit assumptions)?

bertsky commented 4 years ago

I have decided in favour of your original proposal (see https://github.com/bertsky/workflow-configuration/commit/5bedd9003bd905cc0371c3427561e5d4356c5e9d). Feel free to re-open if you see anything wrong...