flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
168 stars 50 forks source link

housekeeping only drains nodes if systemd unit can be run #6118

Open grondo opened 4 months ago

grondo commented 4 months ago

The housekeeping service relies on the systemd unit to drain ranks that fail housekeeping. However, if the housekeeping systemd service isn't configured or fails to start, then the node is not drained. Instead the node is put back into service without housekeeping being run, which could cause any number of failures.

garlick commented 4 months ago

Perhaps it was a misstep to put drain logic in the systemd unit file at all. if we have to do it in the job manager also, then we may as well just move it there I guess.