coreos / container-linux-update-operator

A Kubernetes operator to manage updates of Container Linux by CoreOS
Apache License 2.0
209 stars 49 forks source link

Distro support #187

Open kfox1111 opened 5 years ago

kfox1111 commented 5 years ago

Are there any plans to update this operator to work with RHEL/CentOS? Conceptionally there doesn't seem much CoreOS specific about it. Perhaps it works already?

sdemos commented 5 years ago

CLUO does end up being pretty Container Linux specific, particularly in the way that it ties into update_engine to poll for updates. Since CLUO doesn't have any real control over the underlying update process, it's really just locksmith running as a daemonset in kubernetes. In general, we are preferring newer tools that have much more direct control over the update process, such as the machine config daemon, which ties into rpm-ostree directly to update the operating system. That one is specifically for Red Hat CoreOS right now.

There was some early exploratory work that integrated this codebase directly with rpm-ostree (https://github.com/ashcrow/container-linux-update-operator/tree/spike) but the focus has been on the MCD system. As far as I know, there is no equivalent tool that integrates with dnf or any other package management systems.

kfox1111 commented 5 years ago

What about a yum plugin that called 'locksmithctl send-need-reboot' on any change? It may reboot more then needed, but could work? Alternately, could you just buypass the locksmith and label the node directly? would the rest of the reboot logic work in that case?

sdemos commented 5 years ago

Sorry for the confusion. I meant that it is architecturally and behaviorally like locksmith, not that it is literally locksmith. The CLUO agent hooks directly into update_engine through it's exposed DBUS API (https://github.com/coreos/container-linux-update-operator/blob/4bb1486f482bc9c365c71e126129e806b5a0fc97/pkg/updateengine/client.go#L61) and whenever update_engine applies a new update (entirely out of band, like on any container linux instance), the reboot coordinator component confirms that only one gets rebooted at a time. The reboot logic might work, but again, there is nothing in CLUO that actually triggers an update, and it's not architected to do that.

dghubble commented 5 years ago

I think we always intended Fedora/RHEL would be designed quite differently, as a different reboot coordinator app.

kfox1111 commented 5 years ago

Do you see the logic around picking nodes, draining, rebooting, and uncordoning as being distro specific? I could see the node agent being specific. Does the reboot manager pay attention to any other state then needs upgrading?

I was thinking of trying to set up ansible to point yum at the new version repo (we version mirror snapshots), yum upgrade, and trigger the locksmith and let the operator reboot things safely. cicd would trigger ansible to upgrade the nodes and the operator would reboot them as needed safely? Alternately, it could maybe skip locksmith entirely and just set node labels directly?