What if './infrastructure' needs to be different for different clusters?

qingvincentyin commented 2 years ago

Things under the /infrastructure folder will be deployed identically to every cluster, right?

What if each cluster is supposed to have some cluster-specific parameters? The proposed folder structure doesn't allow per-cluster Kustomization for things under ./infrastructure, right?

What would be your suggestion? I would think it needs to be split to 2 folders:

/infrustructure-common -- with the same folder structure you proposed
/infrustructure-cluster-specific -- with the following structure:

infrustructure-cluster-specific
   ├── base
   └── overlays
       ├── production
       └── staging

Am I understanding it right?

P.S. Your /apps folder currently has this structure:

apps
   ├── base
   ├── production
   └── staging

While there's nothing wrong with that, the proposal here seems to be easier to keep track of, i.e., re-arrange it to:

apps
   ├── base
   └── overlays
       ├── production
       └── staging

The explicit overlays folder is a self-documenting way to say that this is a Kustomization thing. That might sound obvious when we are right in the middle of discussing Kustomization alone, but after a while, with so many technologies mixed in (app's Kubernetes manifests, Flux bootstrap folders, Flux HelmRepository, Flux HelmRelease vs. Helm Charts, Flux Kustomization vs. original Kustomization -- as debated in Issue 321), it becomes quite confusing which folder is for which technology.

thepaulmacca commented 2 years ago

Good question, and keen to know the answer to this as well

If it helps, I came across this repo the other day https://github.com/yoandl/fluxv2-infrastructure-stack-example

This has helped me quite a bit to get my head around it, but would be good to hear from the flux maintainers on whether this is the correct approach or not

kingdonb commented 2 years ago

There are a number of related questions embedded in this question, with regard to repo design; however, I don't think we're likely to get a maintainer opinion on which layout is correct, because the answer is (as we might be getting used to hear a lot): it depends!

Depends on operational decisions that you may be making along the way, and on your organizational structure (at least according to Conway's Law).

There is an important reference example here: https://github.com/fluxcd/flux2-kustomize-helm-example#identical-environments

Users have asked, how do I have more than one cluster that work from the same cluster directory and are all read-only? (Not what the example above does. Eg. "I don't want a separate kustomization.yaml for each cluster, they should all be identical." Flux doesn't quite work like that out of the box! But it's a perfectly reasonable ask.) You may have deployed a cluster in every region that you have customers, and when there are many regions, it may not make sense to add a sub-directory for every cluster in every region, when they are all supposed to be identical and you want to essentially manage them all in concert.

But that separate directory pattern is currently the guidance prescribed by Flux, because it leaves open the possibility that some clusters are bound to be different from time to time, and so you can add changes in a Kustomize overlay for that cluster; it can be traumatic (certainly "breaking") for cluster operators to change the path that a Flux installation syncs from, so I think it is better to avoid that by getting the layout right at first.

Furthermore, Flux bootstrap intends to manage a deploy key for each cluster that is keyed for storage in a way that assumes the ./clusters/my-cluster path will only be used by one cluster; else Flux will overwrite deploy keys unless someone in an admin role takes charge of managing SSH deploy keys for each cluster in some other way. But you can manage them some other way!

You might think of production and staging as resource profiles, so that production clusters can be replicated and the contents of each cluster can be maintained in unison with the others, while staging changes independently of the others. These cluster instance profiles need not necessarily represent only one cluster. This approach does not scale unless this invariant is really safe and never to be violated.

I like base and overlays as a pattern, it is instantly recognizable as the Kustomize CLI structure 👍

And you can of course make apps and infrastructure structured more similarly to each other, so that clusters that import a profile can opt-in to each "infrastructure add-on" like clusters import apps as if they are adopting a tenant. You could also consider infrastructure as a unitary thing that should not bifurcate and can't be differentiated on each "special snowflake" as perhaps it is structured the particular way that it is in your organization due to policy reasons, and this is very likely what was intended by the original design provided in the canonical Flux examples.

I don't see anything objectionable in the linked example, but I think most Flux examples like this (including this one) are intentionally limited to 1 or 2 apps, or 1 or 2 clusters, or 1 or 2 environments, when this is not realistic compared to any real-life deployment, and many deployments in the wild won't have only have 1 or 2 apps in the cluster. Think of the example of a multi-cloud deployment, where you might have production-aws, and another production-oci, or gke, or something else.

In a multi-cluster, multi-tenant environment, this can get complicated, and as a Flux maintainer I'd love to hear from organizations that grew their installations to scale within a single repo if these patterns worked for them, or what additional structures they found they needed. I think the best advice on this subject is going to come from real production experience.

thepaulmacca commented 2 years ago

Thanks for your response, very helpful. It would be great to hear from others out there how they're doing it, as this is something I'm struggling to get my head around at the moment

Luckily for us, we're still in the very early stages, so have the time to get this right first time. We only have 2 clusters for now in a single region (dev and prod), but that could easily scale out to other regions in future

fluxcd / flux2-kustomize-helm-example

What if './infrastructure' needs to be different for different clusters? #44