flux-framework / flux-coral2

Plugins and services for Flux on CORAL2 systems
GNU Lesser General Public License v3.0
9 stars 7 forks source link

dws: add marker for prolog to enable the nnf-dm daemon #179

Closed jameshcorbett closed 2 months ago

jameshcorbett commented 2 months ago

Problem: There is no marker that a job needs the nnf data movement daemon to be running during its duration. Without a marker, an administrative prolog cannot tell whether to start the daemon.

Add a boolean to the dws-environment event that indicates whether the nnf-dm daemon is needed. The administrative prolog can watch for this event and its context.

Fixes #167.

@grondo, what would be the best way for a shell script (the administrative prolog) to watch for the boolean? It is already watching for the dws_environment event, but now it also needs to know whether the copy-offload entry in the context is true or false.

grondo commented 2 months ago

what would be the best way for a shell script (the administrative prolog) to watch for the boolean?

How does the prolog wait for the dws_environment event now?

One way would be to dump the eventlog as json and process with jq:

flux job eventlog -f json ƒ9zRvw7KVq | jq 'select(.name == "dws_environment")| .context["copy-offload"]'

This would print true or false I think if the dws_environment event is found in the eventlog (and copy-offload exists in the context), otherwise it would fail.

You could also do this in two stages by saving the eventlog or the dws_environment entry to a file, then process with jq as a separate step.

There may other ways as well (e.g. flux job wait-event has a --match-context option.

jameshcorbett commented 2 months ago

Perfect, thank you!

jameshcorbett commented 2 months ago

Thanks! I just force-pushed to add an extra test to t1002-dws-workflow-obj.t. Setting MWP.