bottlerocket-os / bottlerocket-update-operator

A Kubernetes operator for automated updates to Bottlerocket
Other
178 stars 41 forks source link

Version update control with GitOps #56

Open springroll12 opened 3 years ago

springroll12 commented 3 years ago

Issue or Feature Request:

Is it possible to upgrade only when prompted by a change to a kubernetes manifest? Automated upgrades are great, but it would be nice to track and trigger these through some auditable process. Ideally we could have a Bottlerocket CRD and change only the version number, flavor or kubernetes version to trigger an upgrade.

My current understanding is that this is not possible?

springroll12 commented 3 years ago

Is this possible? I think it would be difficult to recommend this operator for production use if you cannot control when updates happen.

jhaynes commented 3 years ago

Rather than extending this operator with this functionality, we are considering building a settings operator to accomplish this instead. Would that fit your use case?

springroll12 commented 3 years ago

It may, yes. As long as the settings operator can perform full OS updates similar to what's done in this operator.

Ideally there would be a Bottlerocket CRD that defines all the settings (maybe some of what's in the userdata currently) that you can just apply to your cluster. eksctl essentially has the structure of this in their bottlerocket setup yaml already.

I'm not sure how it would work for cluster bootstrapping though, as the CRD (by necessity) would have to be applied after the nodes are created. It would probably have to be able to handle version downgrades as well.

I am concerned that the settings operator (or this one) doesn't seem to be a priority though. Also it should not use SSM as mentioned in that thread otherwise it won't be portable. It should just spin up a pluggable admin/control container to perform the API actions.

springroll12 commented 2 years ago

I just want to congratulate the brupop team on the release of v0.2.0! Thanks for all the hard work putting it together and I am very excited to give it a try! I believe this release will address this issue so it could be closed?

cbgbt commented 2 years ago

Hello. Many thanks for the kind words, and we certainly welcome any feedback if you do happen to try brupop 0.2.0!

Unfortunately, I think we still want to keep this issue open. While brupop uses custom resources internally to coordinate updates between the operator and individual nodes, these resources don’t provide a great interface for cluster administrators to orchestrate moving to specific versions at a given time.

Architecturally, I think that we still want to accomplish this via the settings operator that was previously discussed. This is something that the team is exploring as a future deliverable. Ultimately, using settings as a single entry point for this mechanism seems like the option that provides the best experience, and I’m hesitant to implement an interface that could interact confusingly with that vision.

In the meantime though, a possible workaround could be to use AWS Systems Manager or a similar automation tool to set settings.updates.version-lock in the Bottlerocket API across your fleet to the specific desired version when you are prepared to upgrade to it. In particular, State Manager provides tools that should provide useful update velocity controls. While this isn’t as ideal as a Kubernetes-native solution, it should have the desired effect.

springroll12 commented 2 years ago

Thanks for clarifying.

I don't think a solution based on the settings.updates.version-lock would work too well in cases where the cluster-autoscaler is enabled. If a new node is added we would need a way to trigger the version-lock. In our case (and I would wager many others) the initial bottlerocket version is specified in some infrastructure code (ie. Terraform) which means to force new nodes to obey the new version-lock we would also need to alter the user-data which forces recreation of all nodes anyway.

Is there some provision for new nodes that join the cluster in the settings operator or brupop design? It would be difficult to sell the idea that if a new node joins the cluster, it has to look up its own version and then restart again to match the rest of the cluster nodes. In that case it makes more sense to just provision new nodes with an updated bottlerocket version and drain and remove the old ones.

AndreiBanaruTakeda commented 2 months ago

Trying to give this a push.

We deploy our EKS clusters via Terraform Enterprise with customized modules. We also have validated environments (GxP) and we'd like to patch them nevertheless.

I like the idea of out of band updating/patching, with in-place updates which Brupop offers, but we can't really use latest in settings.updates.version-lock in our user-data, because of the above.

Our EKS clusters are connected to a centralized ArgoCD instance. I would like to deploy Brupop with Argo, as an ApplicationSet and manipulate the target version of Bottlerocket from there. This way I can do dev clusters first, then qa, and finally prod.

An alternative I'm considering is to use an ArgoCD PreSync hook which should manipulate the settings.updates.version-lock of the nodes in the cluster, and set it to the version I desire.