fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.35k stars 591 forks source link

Multi-cluster section #611

Open scottrigby opened 3 years ago

scottrigby commented 3 years ago

Follow-up after #608. See https://github.com/fluxcd/flux2/pull/608#pullrequestreview-553880064

Can include multiple environments, production setup, and promotion.

stealthybox commented 3 years ago

I agree -- there are several paths for multi-cluster topologies. We should help folks looking to do multi-cluster understand the concepts that can compose for their environment:

For promotion, there are several strategies:

^ These can be combined and have significant impact on the way that platform is managed and software is released. Should also link out to the notification / alerts docs and observability

Will add more detail later

stefanprodan commented 3 years ago

There are two examples in fluxcd org on multi-cluster that show different approaches:

As a starting point, we could write a doc page that introduces the 2 examples and links to the docs inside each repo.

gerardgorrion commented 3 years ago

Tagging the repo commits is a flux v1 feature that we need to upgrade flux v2.

benjimin commented 6 months ago

I think the docs could do a better job on advising how to manage multiple environments and phased promotions between them.

Particularly, should everything be in one branch? The monorepo has two obvious downsides:

Having two branches (e.g. production and staging) makes promotions more difficult to perform using git tools. If the branches are totally separate then divergence accumulates, making the repo generally difficult to manage. If the content is factored into a common base and overlays, it is easier to identify and remove divergence between the branch heads, but the histories nonetheless become unrelated (e.g., if a hotfix must be promoted to one app ahead of other already-staged changes for other apps, assuming multiple collaborators use the repo) which makes merges nontrivial.

Should there be a second repo where all of the applications are gathered? For applications represented as helm charts, this is viable (compare bitnami's chart repo) so long as the apps repo has CI to package and release updated charts into an OCI registry, and flux image tag automation is used to propagate releases into appropriate parts of the environments repo. It can also work for kustomize if using a scheme of app-specific release tags (as the flux gitrepo resource can refer to a specific subpath and a specific version pattern). This approach relies on refactoring the apps to minimise what config lives in the env overlay or helmrelease manifest, otherwise environments will still exhibit duplication (and be prone to unintended divergence). The downside is that this requires either intricate CI (for chart packaging) or is reliant on tags for promotions (note tags are less safely preserved than the commit history).

Separate repos for each application is hardly different to having one separate repo for all apps.

Another option is trunk based development with no separate staging environment, either by:

There are so many common pitfalls, this would be a great place to advise on best practice as in part, the problem originates from flux orchestrating an entire environment from a repo.

stefanprodan commented 6 months ago

I've been working on an a reference architecture for using Flux to manage the continuous delivery of Kubernetes infrastructure and applications on multi-cluster multi-tenant environments.

https://github.com/controlplaneio-fluxcd/d1-fleet

The setup is comprised of multiple repositories that:

benjimin commented 6 months ago

@stefanprodan am I correct in understanding that your reference architecture for continuous delivery has the app repos each use separate different branches for production and staging? In other words, it adopts the "git flow" branching model documented here and here. Note that both those references discourage the use of that branching model for modern DevOps and continuous delivery. Is there anywhere that you've addressed those concerns or presented more discussion regarding the choice?

stefanprodan commented 6 months ago

There is a big difference between code and infra, in my setup the main branch contains both overlays (which is never the case with code). Merging into another branch like production, or tagging the main brach with a semver tag, is just a way to say "promote this". The actual work is done in main, this is trunk-based development, not git flow. One improvement I would like to make, it to use the production branch as a canary, that gets synced by a single production cluster, and promoting the changes to the whole fleet would happen after tagging that branch with a semver tag.