flux-framework / flux-coral2

Plugins and services for Flux on CORAL2 systems
GNU Lesser General Public License v3.0
9 stars 7 forks source link

dws: add support for draining Offline nodes #140

Closed jameshcorbett closed 7 months ago

jameshcorbett commented 7 months ago

Problem: compute nodes can lose their link with their local rabbit. If that happens, the node should be drained with a reasonable message so admins can investigate.

Kubernetes Storage resources list the status of the links between rabbits and their compute nodes. Watch the status and drain nodes if their status changes to 'Offline'.

Fixes #139 .

jameshcorbett commented 7 months ago

Thanks! Setting MWP.