Open kfox1111 opened 6 years ago
CLUO does end up being pretty Container Linux specific, particularly in the way that it ties into update_engine
to poll for updates. Since CLUO doesn't have any real control over the underlying update process, it's really just locksmith running as a daemonset in kubernetes. In general, we are preferring newer tools that have much more direct control over the update process, such as the machine config daemon, which ties into rpm-ostree directly to update the operating system. That one is specifically for Red Hat CoreOS right now.
There was some early exploratory work that integrated this codebase directly with rpm-ostree (https://github.com/ashcrow/container-linux-update-operator/tree/spike) but the focus has been on the MCD system. As far as I know, there is no equivalent tool that integrates with dnf
or any other package management systems.
What about a yum plugin that called 'locksmithctl send-need-reboot' on any change? It may reboot more then needed, but could work? Alternately, could you just buypass the locksmith and label the node directly? would the rest of the reboot logic work in that case?
Sorry for the confusion. I meant that it is architecturally and behaviorally like locksmith, not that it is literally locksmith. The CLUO agent hooks directly into update_engine
through it's exposed DBUS API (https://github.com/coreos/container-linux-update-operator/blob/4bb1486f482bc9c365c71e126129e806b5a0fc97/pkg/updateengine/client.go#L61) and whenever update_engine
applies a new update (entirely out of band, like on any container linux instance), the reboot coordinator component confirms that only one gets rebooted at a time. The reboot logic might work, but again, there is nothing in CLUO that actually triggers an update, and it's not architected to do that.
I think we always intended Fedora/RHEL would be designed quite differently, as a different reboot coordinator app.
Do you see the logic around picking nodes, draining, rebooting, and uncordoning as being distro specific? I could see the node agent being specific. Does the reboot manager pay attention to any other state then needs upgrading?
I was thinking of trying to set up ansible to point yum at the new version repo (we version mirror snapshots), yum upgrade, and trigger the locksmith and let the operator reboot things safely. cicd would trigger ansible to upgrade the nodes and the operator would reboot them as needed safely? Alternately, it could maybe skip locksmith entirely and just set node labels directly?
Are there any plans to update this operator to work with RHEL/CentOS? Conceptionally there doesn't seem much CoreOS specific about it. Perhaps it works already?