flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
166 stars 49 forks source link

housekeeping only drains nodes if systemd unit can be run #6118

Open grondo opened 1 month ago

grondo commented 1 month ago

The housekeeping service relies on the systemd unit to drain ranks that fail housekeeping. However, if the housekeeping systemd service isn't configured or fails to start, then the node is not drained. Instead the node is put back into service without housekeeping being run, which could cause any number of failures.

garlick commented 1 month ago

Perhaps it was a misstep to put drain logic in the systemd unit file at all. if we have to do it in the job manager also, then we may as well just move it there I guess.