@everyone :boom: Providers! Especially the GPU-providers (not limited to :warning: )
Make sure you have disabled the unattended upgrades!
Unattended upgrades can bring all sorts of uncertainty/troubles such as upgrade your nvidia drivers and "lock-up" your K8s cluster. (nvidia-smi will hang on the host/pod; nvdp plugin will stuck and hence K8s cluster will be running in a non-desired state where closed deployments will be stuck in Terminating status)
This impacts me, what do I do now?
Check your provider, and if you experience any of these issues (nvidia-smi not hangs, pods stuck in Terminating state), then just reboot your impacted K8s nodes, preferably after disabling the unattended upgrades (see next step).
How to disable the unattended upgrades?
To disable the unattended upgrades, execute these two commands on your worker & control plane Ubuntu/Debian-based nodes:
Ref. https://discord.com/channels/747885925232672829/1111749248527114322/1157754908624298074
@everyone :boom: Providers! Especially the GPU-providers (not limited to :warning: )
Make sure you have disabled the unattended upgrades!
Unattended upgrades can bring all sorts of uncertainty/troubles such as upgrade your nvidia drivers and "lock-up" your K8s cluster. (
nvidia-smi
will hang on the host/pod; nvdp plugin will stuck and hence K8s cluster will be running in a non-desired state where closed deployments will be stuck inTerminating
status)This impacts me, what do I do now?
Check your provider, and if you experience any of these issues (
nvidia-smi
not hangs, pods stuck inTerminating
state), then just reboot your impacted K8s nodes, preferably after disabling the unattended upgrades (see next step).How to disable the unattended upgrades?
To disable the unattended upgrades, execute these two commands on your worker & control plane Ubuntu/Debian-based nodes:
Verify
These commands should output
0
like in this example:cc @ScottCarruthers#8207
cc @chainzero