Open mitchellmaler opened 5 years ago
All questions are really on spot but many pieces are still moving, so I'll try to give an overview of the current state (which may change soonish). Please do note that this could fit into a larger discussion around k8s-updates / config-management / machine-config-operator, but I'll keep the scope of this ticket to "FCOS update-reboot orchestration on k8s" only, on purpose.
For reference, the historical decisions behind this are recorded at https://github.com/coreos/fedora-coreos-tracker/issues/3.
Does this mean we are required to run another etcd cluster just for updates or is it possible to make use of kubernetes objects to orchestrate the updates using an operator?
That isn't the intended usage, no. The scope of airlock is just to replace the same logic in locksmith, which only supported etcd as distributed backend. The usecase is for machines that already have direct access to an etcd cluster, likely without any access to the objects of an higher-level orchestrator. If you have to deploy an etcd cluster just for airlock, then there are better options to consider.
Will there be an equivalent for Fedora Coreos that can be deployed to a Kubernetes cluster and work with zincati to orchestrate updates?
That's the idea, yes. But we don't plan to write orchestrators for each possible backend on our own, nor shove all of those into airlock. Instead, the plan is to stabilize the HTTPS-based protocol that Zincati uses, so that the reboot-manager can run in a separate container and its implementation can be swapped to support other backends. Within this context, each community with a common interest can maintain its own containerized manager, decoupled from the OS and from other backends/implementations.
As of this date, we are still stabilizing the basics of auto-updates, so fleet-wide orchestration is still on the development radar. The protocol is currently drafted at https://github.com/coreos/airlock/pull/1, while the client-support in Zincati is tracked at https://github.com/coreos/zincati/issues/37.
@lucab Thanks for the overview! I am glad there will be similar functionality in the future.
Right now in Red Hat OpenShift we have the machine-config-operator (mco) for this. In the initial release of OKD4 it will do the FCOS updates instead of the airlock/zincati duo that usually does it in FCOS, and using a slightly different delivery update payloadmechanism (ie. os-container aka container embedded ostree vs usual rpm-ostree commit). We will do our best to abstract away the interfaces for those controllers and make them replaceable/pluggable (in way that would allow Zincati/Airlock to control how mco/the cluster does things)
@lucab https://github.com/coreos/zincati/issues/37 now seems closed, would you be open to sharing what the current state is? 😍
Related inquiry: https://github.com/coreos/zincati/issues/214.
@MPV I've left a few cross-links in place, so if you want to explore more feel free to click-through. However, below is a quick summary of the current status.
etcd
strategy is done, see https://github.com/coreos/airlockCircling back to my original reply, now we are basically at this point:
[...] the reboot-manager can run in a separate container and its implementation can be swapped to support other backends. Within this context, each community with a common interest can maintain its own containerized manager, decoupled from the OS and from other backends/implementations.
I do not get the point why airlock was done with etcd instead of k8s as a backing store. I think airlock should actually be configurable to use k8s locking mechanisms.
Edit: the question is also, what happens if airlock is only installed on 1 node and the node restarts, does the lock still stands or does the node retries until the airlock server is up again? if the latter is the quase, it will probably be really simple to create a good k8s integration.
I do not get the point why airlock was done with etcd instead of k8s as a backing store.
This is recorded with actual historical details and technical discussions at https://github.com/coreos/fedora-coreos-tracker/issues/3, feel free to go through it. The TLDR is "because it replaces locksmith etcd strategy".
Also, please beware that k8s API does not model a database with strongly consistent primitives (e.g. old HA clusters without "etcd quorum read" do return stale reads).
I think airlock should actually be configurable to use k8s locking mechanisms.
That's understandable, but its design scope is explicitly not covering it. There are plenty of details to figure out (authentication, consistency, hooks, tolerations, draining, etc.) to warrant its own project by somebody intimately knowledgeable with k8s. See the rest of the discussion about having dedicated containerized lock-managers.
The client->server protocol itself is documented at https://github.com/coreos/airlock/pull/1/files and designed to be easy to implement as small web-service on top of any consistent database.
the pr actually points to a rough explanation. not to a "protocol documentation".
Just saw this new project being worked on by Rancher to be a more generic upgrade operator not just rancher specific. Wonder if it could be enhanced to work with Fcos upgrades. It might even be able to work as it is, need to dig into it more.
@lukasmrtvy Excellent question! @lucab, do you know what that text is about?
Looks like that text was part of our annoucement lauch FAQ posted in June of 2018, so it may have been a little misguided or incorrect in retrospect.
Bunch of updates:
https://github.com/poseidon/fleetlock implements Zincati's FleetLock protocol on Kubernetes. Its small, nothing fancy (no drain).
https://github.com/poseidon/fleetlock implements Zincati's FleetLock protocol on Kubernetes. Its small, nothing fancy (no drain).
It actually have drain support now
Currently on CoreOS Container Linux we make use of the container linux update operator to orchestrate the updates (restart) of our Kubernetes cluster nodes based on it's configuration and agent integrating with locksmith. Will there be an equivalent for Fedora Coreos that can be deployed to a Kubernetes cluster and work with zincati to orchestrate updates?
I noticed the airlock project which can run as a container and needs to connect to an etcd3 server (cluster) but while running under kubernetes we already have etcd nodes but cannot give access to those (policy). Does this mean we are required to run another etcd cluster just for updates or is it possible to make use of kubernetes objects to orchestrate the updates using an operator?