Closed JOUNAIDSoufiane closed 2 months ago
Requires code from:
Zun: https://github.com/ChameleonCloud/zun/pull/20 Doni: https://github.com/ChameleonCloud/doni/pull/138
Order to apply is:
note: add worker_node_taint
with a | default(something)
to kolla/defaults.yml
Added k8s worker taint configuration options
The options to add to site-config are:
Added taint tolerations to core deployments in k3s
k3s defaults.yaml gets the value of k3s_worker_taint from worker_taint which is defined under kolla/defaults.yml which subsequently defines defaults for the option and takes in site values from the k8s_worker_taint option that can be specified through the site config.
Added worker node taint toleration to smarter devices manager daemonsets.
Furthermore, templated the nvidia device plugin daemonset and added the toleration there as well.
Taint deployment strategy on a running testbed:
zun_tolerate_worker_taint
toTrue
and redeploy Zundoni_enable_worker_taint
toTrue
and redeployThe above sequence ensures that no existing or simultaneous user pods get evicted and inflicts minimal downtime to core daemonsets.