Closed abuehrle closed 3 years ago
Hi,
Based on the discussion on Weave Slack #flux with (slack users: marlin, mbridgen, hidde), Here's a summary of what we thought about documenting the best practices for managing multiple environments with flux and git.
app
service in test environment with the latest version?app
service in other environment?app
service to the previous version in production?As you can see, there are many questions to answer and they are not only "environment management" specifics. I think this is important to ask the main questions around the Git repository and the avoidance of code duplication first (without Helm or Ksonnet).
The format can be:
I think the Git repo is the best way to start as we can quickly confront the theory and the reality. Everybody can view and elaborate the tutorial. When this repo will be stable, documentation and blog post can follow after.
I have created a repo for housing the example: https://github.com/weaveworks/multienv-example
I think the answer to this question will be different for helm and also will change with the next release of helm which makes it quite hard to handle. Scoping namespaces is an important part of the picture - here is a long discussion about that here - github.com/kubernetes/helm/issues/2060
One of the challenges we are having in the multi-environment setup is the duplication of the same definition for multiple environments in different branches, and the only real thing that is changing is that master / qa are referring image different patterns for automated deploy.
GitFlow example by Hidde Beydals on Slack:
We have a CI pipeline that first does validation on the K8S manifests using kubeval ( (with the right specs configured for your K8S version) and then our custom validations (e.g. no 'latest' image, correct labels set, etc). If the pipeline fails one is not able to merge.
We also require reviews from our ops team before one is able to merge. So engineers are able to create a PR (and save us work) but it is always approved by ops.
Do most people put their helm charts inside the pertinent repository or in an external repository with all charts which represent the entire system of microservices - any thoughts on this ?
We keep the actual copy of our running chart in the same repository/branch as the cluster they're running on. But we also have a repository with all our in house charts.
Using git submodules or just replicating manually each change ?
helm fetch
So the update frequency of the charts themselves isn't that high (or you don't always need the updates).
We keep the versioning managed there, we have a CI job setup that automatically rebuilds the index.yaml
for the Helm repository and does the validations and stuff.
So inspection/quality assurance and version management, etc. all happens there.
We separate our clusters by environment, not by product. We have multiple products living in the same cluster env and the separation happens on branch level.
So we have a repository my-cluster
with branch production
, staging
, ...
Our production cluster syncs with the production branch, staging syncs with staging, etc.
There is one caveat and that is that your clusters will deffer.
So a simple merge from staging
-> production
is almost never the case.
Our staging environments for example do almost get no traffic, they're only there to verify configurations, test things, etc.
So the values
field in your FluxHelmRelease
will almost always differ.
We also have some additional services running in our staging cluster (or in our production cluster).
We let it flow upwards.
git checkout -p <lowest env branch> -- <folder/files>
Just for the record; I am the author of the Slack messages posted above and any questions regarding this approach are welcome.
Would love to get updates on this - mostly with regards of understanding how other people are doing it and what works well/what doesn't.
Maybe the flux team can give info on how they use it? :)
We currently (pre gitops) have a simple structure of master
and stable
.
Our repo's typically manage multiple environments (dev
, staging
, prod
).
Interestedly enough tools like ksonnet/kustomize both lend themselves to this structure. Kustomize specifically has overlays
which are often geared towards supporting different environments.
We currently use ansible to deploy, so we get to choose which inventory to use, and tie that to master
or stable
(so we can be sure the prod inventory always deploys stable).
At the moment it's just creating a lot of questions for us, which are mostly related to how we manage different environments.
git-path
, but if you have a mono repo with lots of applications, not entirely sure how well this works (i.e. could it support /<environment>/*/
(where * is any number of app directories).
git-path
isn't ideal here, it probably suggests that you need to have a branch per environment.Not really looking for any answers here, just getting my thoughts down :)
I noticed this the other day: https://kubectl.docs.kubernetes.io/pages/app_composition_and_deployment/structure_introduction.html
It's not really specific to kustomize and explains various use-cases / directory and/or branch layouts that you may choose.
Might be useful for anyone that has questions in this area.
This is kind of discussion I have been searching for few days. I have a successful flux based CI/CD and would like to expand this to multi env by "artifact promotion" viz., develop -> staging -> pre-production -> production etc . While using Helm in the workflow is somewhat nice but, I want to achieve this w/o Helm.
At the moment, our CI deploys the "master" image to a registry (with tag master-<version>-<short_commit_id>
) after the tests have finished successfully. the staging flux watches for master-*
. The production promotion is done via requesting a PR on production repo
@tckb could you describe the production repo a little more? Is this completely separate repo, or a branch?
My current flow is:
I'm not yet happy with the production flow as I don't really want to be creating git-tags that may not pass tests, hence maybe a production branch (and a PR) would be better (which might be what you're doing).
@nabadger the staging workflow looks more or less the same. in the production workflow, I imagine to just create a PR with a approved build from staging and the production flux takes care of deploying it.
I don't have a semver and production deployments are not automated. One has to request a PR with latest build -- this is the part I am working on at the moment.
PS: I am in the process of drafting a workflow with multiple environments with flux and the workloads are not for production.
@nabadger just finished it and have it working smoothly with GitOps. 😄
I am looking into implementing a multi-environment flux deployment flow. One thing I don't understand is how do you trigger integration tests to be run in a CI system that are required to be executed AFTER the deployment of an app has occurred? Since the CI system is not actually aware of the deployment existing/happening because flux follows a 'pull' model, it makes it difficult to know when CI integration tests should be triggered to start.
Here is an example flow:
I am looking into implementing a multi-environment flux deployment flow. One thing I don't understand is how do you trigger integration tests to be run in a CI system that are required to be executed AFTER the deployment of an app has occurred? Since the CI system is not actually aware of the deployment existing/happening because flux follows a 'pull' model, it makes it difficult to know when CI integration tests should be triggered to start.
Here is an example flow:
- I check in code to git
- CI system builds/tests the app then publishes a dev tagged docker image
- Flux deploys the dev image into my dev cluster
- CI system starts running integration tests (I am unsure of how to automatically trigger this step)
- Promote (re-tag) the docker image to staging environment/cluster.
- Flux deploys to staging cluster
@jwenz723 FYI - I'm not a part of the project, just a user but I have integrated with Fluxcloud and there is a --connect
parameter that send events to fluxcloud (and weave cloud) so I'm assuming you would be able to tie into that to receive event(s) from Flux and act accordingly.
How does cleanup happen for step 5? I have a dev branch that was deployed; approved and now merged to staging... what happens to that environment?
I am looking into implementing a multi-environment flux deployment flow. One thing I don't understand is how do you trigger integration tests to be run in a CI system that are required to be executed AFTER the deployment of an app has occurred? Since the CI system is not actually aware of the deployment existing/happening because flux follows a 'pull' model, it makes it difficult to know when CI integration tests should be triggered to start.
Here is an example flow:
1. I check in code to git 2. CI system builds/tests the app then publishes a dev tagged docker image 3. Flux deploys the dev image into my dev cluster 4. CI system starts running integration tests **(I am unsure of how to automatically trigger this step)** 5. Promote (re-tag) the docker image to staging environment/cluster. 6. Flux deploys to staging cluster
I have been looking into this too. My solution is to dedicate a couple of CI agents to integration test deploys. These CI agents do have access to the cluster, so you introduce an extra attack vector, but you get the benefit of being able to directly manage the lifecycle of these temporary deployments. Flux only garbage collects resources under it's control, so if you control your integration deploys outside of flux it won't interfere.
Is my understanding correct that in a multi-environment flux requires a dedicated long-lived git branch that it syncs up with (e.g. staging, production)? Thus, for example, a deployment to production happens with a PR merge into the prod
branch and not a git tag push?
Is my understanding correct that in a multi-environment flux requires a dedicated long-lived git branch that it syncs up with (e.g. staging, production)? Thus, for example, a deployment to production happens with a PR merge into the
prod
branch and not a git tag push?
I think you have more options than just a branching strategy, here's a few options that should work:
There's a fair amount of flexibility. In my case I'm unlikely to use branches to control deployment to particular environments because the company I'm setting this up for prefers trunk based development.
@grahamegee Thank you for sharing your ideas. It really helps and I truly appreciate it. We'd also prefer the trunk based development without long lived branches, but at this point, I am just not clear how to implement deployments to production once a commit in master
is tagged with a semver release tag. I understand that flux can watch git repos, branches, paths, but not the git tags. For example, the flux operator in staging environment cluster can watch the master
branch - that's no problem. But how would the flux operator in the production cluster sync up with the release commit tagged in master
? So far, it seems that it's going to be either different git branches or different directories for us, since we are on a mono repo.
@demisx. In my initial response I misread some of what you said sorry! So in this edit I'm removing all the fluff about how I think flux works!
I think you're right that you can't trigger a deploy directly from a git tag.
In order to get flux to deploy from a SemVer release tag you would need a script/pipeline/developer to Build and push a docker image tagged with the SemVer after the commit has been tagged. You would also need to make sure you have a "production" manifest in your config repo which has a flux annotation that matches on SemVers. This "production" manifest will get updated by flux when the docker image is pushed.
As you are using a monorepo (I assume your manifest files are also in there), you probably want to structure it such that all the manifest files are contained in a config sub directory and flux is configured to only monitor the sub directory.
@grahamegee I think you are spot on. This is my understanding also how it works. I am going to try the different sub directories route. Once again, thank you very much for sharing your thoughts. It really helps.
@grahamegee I think you are spot on. This is my understanding also how it works. I am going to try the different sub directories route. Once again, thank you very much for sharing your thoughts. It really helps.
Yup I think that's a good plan! I'm likely to try the same thing.
@tckb could you describe the production repo a little more? Is this completely separate repo, or a branch?
My current flow is:
Staging
- Git commit + push to master
- CI/CD runs tests, if they pass, it tags and pushes an image _master-${CI_COMMIT_SHASHORT}
- Flux in staging monitors apps matching tags and deploys (if automation enabled)
Production
- Git tag + git push tags (i.e. git tag v1.0.0 && git push --tags)
- CI/CD runs tests, if the pass, it tags and pushes an image _${CI_COMMITTAG} (supports semver only)
- Flux in production monitors apps matching semver and deploys (if automation enabled)
I'm not yet happy with the production flow as I don't really want to be creating git-tags that may not pass tests, hence maybe a production branch (and a PR) would be better (which might be what you're doing).
This is what I'm looking at now but would be great if there was an option for automation that creates a PR instead of just pushing to the "master" branch so that ops could approve the PR for deployment to production
For what it is worth, my team has the requirement to deploy applications to multiple environments: sandbox, dev, test, staging, production (in that order). The number of required environments is due to the fact that we interact with legacy applications following legacy deployment strategies. We are ok with our code being deployed to sandbox, dev, and test clusters at the same time, so we treat all 3 of these environments as equivalent. Here is the strategy we use to accomplish our deployments:
environment | git branch | kustomize base | kustomize overlay |
---|---|---|---|
production | master | base-prod | production |
staging | staging | base-prod | staging |
test | staging | base | test |
dev | staging | base | dev |
sandbox | staging | base | sandbox |
base
and base-prod
are simply directories that exist within the repository that act as kustomize base's. All resources defined within base-prod
are also deployed by base
because base
inherits base-prod
. We place all code in base
until we feel it is ready to go to production, at which point, we move the code from base
into base-prod
and push to the staging
branch to have the code deployed to our staging cluster. The one downside to this approach is you need to make sure that code isn't placed into base-prod
until it is ready to go to production or else you will cause code merges to be blocked.
Having a kustomize overlay per each environment provides us with the necessary flexibility to specify configuration parameters specific to each environment.
Having 2 branches (staging and master) provides us with the necessary ability to test code that is placed into base-prod
in our staging environment before the code gets pushed to production. We try to keep our staging and production clusters as close as possible to each other, so both of these clusters share the same kustomize base.
As requested by @2opremio on Slack:
Is this a fair summary of directory- and branch-per-environment pros and cons? I've put it together for my team and thought it might be useful to include in the docs.
Branch-per-environment :+1: simpler filesystem structure: a single set of resources :+1: a cluster is modified by modifying its branch :+1: divergent branches (=> clusters/environments) can be detected via git diff and so potentially automatically brought back into sync
:-1: PR process is complicated by having to choose the correct branch as the comparison base/merge target :-1: some changes will need to be merged to all branches :-1: divergent branches will cause merge conflicts :-1: flux cannot be configured to watch multiple branches for a single cluster
The first two of the :-1:s can be addressed with automation (GitHub actions, command-line app, etc.). Divergent branches can't be resolved with automation, but might be prevented entirely by automation. The last :-1: can be resolved with a flux instance per environment
Directory-per-environment :+1: simpler branching structure: a single branch means no risk of merge conflicts :+1: a cluster is modified by modifying its directory :+1: PR process is simple: branch off master, merge back to master :+1: multi-cluster updates are simpler: side-by-side comparison of cluster state in the repo, copy-paste changes between files :+1: flux can be configured to watch multiple paths for a single cluster
:-1: divergent environments may be harder to detect :-1: mental load of having everything in the same place is not insignificant
The first of the :-1:s can be addressed with automation (PR checks, GitHub Actions, commit hooks, etc.)
These are not mutually exclusive if one uses e.g. kustomize. The setup is:
In each environment, flux is points to the corresponding git branch, and "kustomize build" uses the common base and the environment specific overlay directory.
Git branches allow the common base to temporarily diverge. For example, if you introduce a new microservice in dev and add its manifests in base, clusters that track other git branches are not affected.
Promotion is a simple git merge. If you now merge the modified dev into test, the new microservice is promoted there as expected. Changes made by the flux daemon (annotations, automatic image releases) are carried over in the merge as well.
Changes specific to a single environment are made in the per environment overlay directory. This ensures clean git merges, and isolates changes to different environments.
The setup is moderately complicated and takes some getting used to, but works well at least in my experience.
@datacticapertti The workflow and git branching you describe could benefit from https://github.com/fluxcd/flux/issues/2568 (see https://github.com/fluxcd/flux/issues/2568#issuecomment-549304271) That would allow to switch from a git branch per environment to a manifest repository following a git trunk and SemVer releases.
@Perdjesk thanks, that is interesting work. It is not immediately obvious to me how it would work in my case, so let me elaborate on why I ended up with my workflow.
With kustomize you can put most of the configuration in a shared base, and the overlays only contain environment specific deltas. This is great. The only problem is that if all the environments track the same git branch, any modifications you do to base affect all the environments.
Case in point, if want to have a new microservice in a dev environment, you would ideally add its manifests in base, and modify base/kustomization.yaml to include them. But if you do this, the new microservice would appear in all other environments as well. You can add the new microservice in the dev overlay, but then you need to copy the files and modify kustomization.yaml for the next environment when you promote to the next environment up. Things can only be moved to base once they have been promoted to all the environments, and then you need to refactor all the overlays at once.
With the branch based approach this is not a problem, as you can modify base in the dev branch without affecting the other environments. Promotion is done with git merge, and modifications to base are carried over to the next environment.
@datacticapertti what do you do for changes for overlays of staging and production? Something like an endpoint or something that you're adding to the overlay? Do you commit this to the dev branch, which does nothing and the merge request/pull request to staging? Similarly for production do you commit to dev -> stage -> master? Similarly for those "dev" only cases if you had certain things that run outside of dev like qa only do those get committed to dev and merged to qa? I can definitely see where the branching can help and hinder at the same time.
@cdenneen unfortunately funding was pulled before we got to production, so I only have experience with two branches (dev and test). But in general, if you have something that you only want in one environment, you can put it in the overlay only and have nothing in base.
As you always merge from a lower environment to a higher one, you end up with cumulatively more overlays. In dev there is only the dev overlay, in test there are overlays for dev and test, etc. I suppose one could prune them, for example only keep the overlay for the immediately preceding environment and git rm others.
Overlays do need some coordination when doing a promotion with git merge. For example, a deployment manifest in base might want to mount an environment specific configmap created by an overlay. Git merge does not help here, you need to manually ensure that the configmap is indeed created.
In practice the coordination is not too bad. You can use git diff to examine what changes are done by the git merge, and kdiff3 or similar to compare the overlays to look for things you may need to change manually.
@datacticapertti right I was just saying if your workflow is something like: dev -> test -> stage -> master And you are updating something that's specific to the production overlay then you would commit it to dev ->(merge)-> test -> stage -> master (so I can see where this would result in a lot of unnecessary merges) (You wouldn't want to commit these changes further upstream of this chain because now the branches have diverged/forked). Definitely has it's ups/downs.
@datacticapertti I think this is what we're going to do and I think one small change in the fluxd args could solve a lot of the problems this approach brings up. If the --git-branch
arg accepted multiple branches and synced with the first one that existed, you could maintain multiple clusters w/o a corresponding branch per cluster.
eg. the develop cluster flux has --git-branch=develop,staging,master
, the staging cluster flux has --git-branch=staging,master
, and the production cluster flux has --git-branch=master
. This way when a change is needed, you can branch master to develop, implement the changes in base in the safety of the development cluster and merge that change into the staging (and eventually master) branch and delete the develop branch w/o breaking the develop cluster
Thoughts?
Flux deploys the dev image into my dev cluster
CI system starts running integration tests (I am unsure of how to automatically trigger this step)
Promote (re-tag) the docker image to staging environment/cluster.
If I followed along correctly it appears the current way to run tests after a sync (integration/smoke/etc) in an automated fashion would be to use FluxCloud(https://github.com/justinbarrick/fluxcloud). Have folks been successful with this?
There are other proposed options such as Post Synchronisation Hook (#2696) which isn't implemented yet or it was suggested to use Flagger. Flagger looks to be designed for use in the production environment. Is anyone using in the staging environment to trigger tests before promotion to prod?
If I followed along correctly it appears the current way to run tests after a sync (integration/smoke/etc) in an automated fashion would be to use FluxCloud(https://github.com/justinbarrick/fluxcloud). Have folks been successful with this?
FluxCloud won't (imo) be suitable for this, since it just knows what flux is doing, but not the state of the application in the cluster.
What you really need is something running in-cluster that's monitoring applications. You need to know that it's rolled out before testing it.
The chatops notification tools in this area this are fairly similar, i.e. they often tell you the deployment status, ready-pods, unavailable-pods etc.
The problem I've seen with such tools is how opinionated they are (i.e. monitoring at the statefulset/deployment level, vs pod level, and differing code as to how you determine a ready-pod)
Something like Flagger would be ideal - it gets quite difficult to chain together a set of tools to get this feature, it would be much nicer if there was just a couple of solutions to achieve it (flux + flagger).
These are not mutually exclusive if one uses e.g. kustomize. The setup is:
- A common base directory
- The flux-patch.yaml file in root directory
- An overlay directory per environment
- A git branch per environment
In each environment, flux is points to the corresponding git branch, and "kustomize build" uses the common base and the environment specific overlay directory.
Git branches allow the common base to temporarily diverge. For example, if you introduce a new microservice in dev and add its manifests in base, clusters that track other git branches are not affected.
Promotion is a simple git merge. If you now merge the modified dev into test, the new microservice is promoted there as expected. Changes made by the flux daemon (annotations, automatic image releases) are carried over in the merge as well.
Changes specific to a single environment are made in the per environment overlay directory. This ensures clean git merges, and isolates changes to different environments.
The setup is moderately complicated and takes some getting used to, but works well at least in my experience.
Do you have the env specific overlay only in the env specific branch or every env branch will have all the env overlays?
Hi All, Please help me on the below scenario: I have 1 single AKS cluster, on this cluster I have created 3 different environments (Dev, QA, and Prod), for each environment 1 name spaces with 1 Nginx ingress controllers, Now I want to use GitOps with Flux for the application deployment,
Here are my thoughts off top of my head:
1 Do I need to install 3 flux environments for dev, QA and Prod?
Yes, you'd need to run 3 different instances of Flux each mapped to the corresponding environment branch. Alternatively, you can place manifests into 3 different environment folders (dev/qa/prod) and use on instance of flux to sync up. We use the latter approach for simplicity.
2 Do I need to add each environment public SSH key on GitHub? Is there any other best practices we can follow?
I am not sure what you need these for, but if your environment uses a different SSH key, then I'd think you'd need to add them all. Sorry, it's hard to recommend anything here without knowing more about your environment and where exactly you use your public SSH keys for.
3 Currently I am using 1 single GitHub repository with set of Yaml files, with 3 different branches (QA, Dev and Prod)? Flux will support for 3 branches
A single instance of flux can be mapped to one git branch only.
Question: can multiple clusters point to the same config repo (and same branch) to deploy the same workloads identically?
Thank you @demisx,
I am not sure what you need these for, but if your environment uses a different SSH key, then I'd think you'd need to add them all. Sorry, it's hard to recommend anything here without knowing more about your environment and where exactly you use your public SSH keys for.
when you run this command: fluxctl identity --k8s-fwd-ns flux it will generate ssh, this ssh key we need to update on GitHub so that my flux can communicate with my repo, all i just wanted to know that do i need to add 3 ssh key's so that my 3 environments flux pods can communicate with my Dev/qa/prod branches
all i just wanted to know that do i need to add 3 ssh key's so that my 3 environments flux pods can communicate with my Dev/qa/prod branches
Yes. Each installation of flux will generate a key. You need to add all the sshkeys (as a deploy key w/ write access) for its corresponding flux instance to be able to access the git repo. Flux will use the configured branch.
(I wish there was a way to reply directly to a comment, so everyone doesn't see a notification/email for comment replies.)
@cloudengineers you can also add a SSH key as a secret and configure flux to use that. That will allow you to use a single key if you want to.
https://docs.fluxcd.io/en/latest/guides/provide-own-ssh-key/
Question: can multiple clusters point to the same config repo (and same branch) to deploy the same workloads identically?
I don’t see why not.
Very interesting discussion. Have anyone links to sample repositories where multi env is implemented?
From memory, one of the patterns I was advised to use was a Flux Operator per application, as the problem I was trying to solve was how do you support Flux and GitOps in an application world where each application has its' own git repo vs an infra repo for a specific environment/cluster/role.
I am yet to adopt this however given the lightweight resource footprint of Flux, it makes sense and keeps things sensible of doing one thing well. I also like the 1:1 of Flux <-> App Repo when it comes to changes and CI/CD. HTH.
Very interesting discussion. Have anyone links to sample repositories where multi env is implemented?
+1
Has anyone figured out a solution for flux to update a protected branch?
Has anyone figured out a solution for flux to update a protected branch?
The only way I am aware of is to grant the flux admin
access to the repo. Which I think means that you have to grant flux access to the repo using a specific github user account (rather than a deploy key).
Here is an example of how to structure a gitops repository for multi-env deployments with Flux2: https://github.com/fluxcd/flux2-kustomize-helm-example
Is there a way to have a folder that deploys to all clusters and is shared like in old flux?
@shaneramey the https://github.com/fluxcd/flux2-kustomize-helm-example shows exactly that, the infrastructure dir is shared across all clusters.
We should create a topic (and maybe a blog) on the best practises on using Flux in multiple environments: Test, staging, and production.