fluxcd / flux

Successor: https://github.com/fluxcd/flux2
https://fluxcd.io
Apache License 2.0
6.89k stars 1.08k forks source link

Kustomize integration support #1261

Closed marccampbell closed 5 years ago

marccampbell commented 6 years ago

Flux should support kustomize integration so that the source repo can contain a base and overlays and have flux deploy the merged specs. Today, I solve this by having the base & overlays in a repo and a secondary CI job that runs kustomize build and commits these into a second repo.

The goal is to automate this and make the source of truth the separate specs (base + overlay).

I'm happy to contribute on this, just wanted to see if this was started and if it would be accepted.

guzmo commented 5 years ago

I guess this is close to being merged now? :) We really want it before we start to setup new environments, so we don't have to do workarounds to handle multi-environment repos.

hiddeco commented 5 years ago

God morgon @guzmo,

We started working with milestones a couple of weeks ago to provide you (users) with more predictable release dates. The release of this feature is planned for the next major release of Flux on June 5.

guzmo commented 5 years ago

Morning! :) Great news. Is it possible to install your dev image until it's released? ( When this is merged ) Would like to get started with it hehe / Andreas

hiddeco commented 5 years ago

There are prerelease builds available that tend to be stable although no guarantee is given.

2opremio commented 5 years ago

removed comment - my segfault was a result of having a bad .flux.yaml configured against the latest image.

Please repost. Flux should never crash, you should have gotten an error indicating what's wrong. Mind sharing the config file as well?

2opremio commented 5 years ago

There are prerelease builds available that tend to be stable although no guarantee is given.

@guzmo that only applies to code which has been merged to master. If you are interested in using it before that, please use the image indicated in PR #1848 until it's merged.

2opremio commented 5 years ago

2 things i would like to highlight in case they make a difference before this code makes it to master

  1. need more error outputs :) Humans are bad and for about half an hour we had a wrong .flux.yaml (the one from this comment rather than the one from this ) and no error or anything was in the logs , it was all quiet and nothing was happening.
  2. With this new "external" way to apply the patch through flux on top of the yaml outputed by the tool ... we now don't have any way to render the manifests as they will look like when flux apply them . We used to render the manifests with kustomize build in CI and check diffs. While this is not a huge deal it is kinda annoying. We will look for some tool to replicate the patching in CI before the diffs are shown

Again, thanks for this !

@primeroz thanks for bringing it up. Yes, we need to improve debugability. It's planned, but will be done after #1848 lands. You can read more about it at debugging section of https://docs.google.com/document/d/1ebAjaZF84G3RINYvw8ii6dO4-_zO645j8Hb7mvrPKPY/. I think it will cover both (1) and (2). Let me know what you think commenting directly in the document

2opremio commented 5 years ago

Something I noticed whilst testing (which might be a general flux thing) was the number of temporary build directories left over by flux in /tmp (inside the flux container)

i.e. it creates

/tmp/kustomize-380447668

That's probably a bug in kustomize , maybe it's already fixed upstream. I will take a look.

2opremio commented 5 years ago

removed comment - my segfault was a result of having a bad .flux.yaml configured against the latest image.

Please repost. Flux should never crash, you should have gotten an error indicating what's wrong. Mind sharing the config file as well?

@nabadger I went through the history and found the problem, there is a fix for it at https://github.com/weaveworks/flux/pull/1848/commits/bd7fa1477e977b9927eb444d77f743cc3ad759f4

2opremio commented 5 years ago

That's probably a bug in kustomize , maybe it's already fixed upstream. I will take a look.

@nabadger I've only found https://github.com/kubernetes-sigs/kustomize/issues/566 , whose fix has been released in Kustomize 2.0.3 (which is the version we ship with flux in this PR).

Would you mind creating an upstream issue?

guzmo commented 5 years ago

@2opremio Ye, I'll wait for it to be merged, so at least this feature is kinda stable :)

rdubya16 commented 5 years ago

@2opremio Is there a reason that the patchFile item is required in .flux.yaml ? It seems to work when you point to an empty yaml but won't work without that item missing. If you don't need any flux specific edits on top of your kustomize files, it seems kind of strange to require unless im missing something.

guzmo commented 5 years ago

Will this support the helm operator as well? Or is that a feature to do when this one is complete?

squaremo commented 5 years ago

Will this support the helm operator as well?

@guzmo Support in what way?

guzmo commented 5 years ago

Well, will it be possible to change the "spec.values" key inside "kind: HelmRelease" yaml files. Maybe it's obvious it does :P But since that's another container running ( helm operator ) I thought maybe it will not be supported in the first version.

squaremo commented 5 years ago

will it be possible to change the "spec.values" key inside "kind: HelmRelease" yaml files

If Kustomize can do that, this PR will be able to do that. (whether Kustomize can or not is unclear to me, but I think there's a pretty good chance)

2opremio commented 5 years ago

@2opremio Is there a reason that the patchFile item is required in .flux.yaml ? It seems to work when you point to an empty yaml but won't work without that item missing. If you don't need any flux specific edits on top of your kustomize files, it seems kind of strange to require unless im missing something.

@rdubya16 It only works with an empty yaml because of a bug I fixed in bd7fa14 . If you want include the generators and omit the updaters use a commandUpdated configuration file instead of patchUpdated one.

2opremio commented 5 years ago

will it be possible to change the "spec.values" key inside "kind: HelmRelease" yaml files

If Kustomize can do that, this PR will be able to do that. (whether Kustomize can or not is unclear to me, but I think there's a pretty good chance)

@guzmo Kustomize or any commands included in the flux image (e.g. templating using a shell script). In the future (when Ephemeral containers land in Kubernetes) we will let you use any commands supplied by a container image you specify.

EDIT: @guzmo we can add additional commands you think will be generally useful for factorizing manifests.

nabadger commented 5 years ago

@2opremio Found an issue where fluxctl can wipe out the flux-patch.yaml file.

fluxctl policy --k8s-fwd-ns flux-apps -n sandbox -w sandbox:deployment/staging-echoheaders-app                                      
WORKLOAD                                    STATUS   UPDATES
sandbox:deployment/staging-echoheaders-app  success  

I failed to supply the --tag option here, but the command went through. I'm not actually sure what flux is meant todo without specifying an option here?

This results in a commit which deletes the contents of flux-patch.yaml

git pull 
...
Updating b42d644..041c022
Fast-forward
 environments/test-gke/nb1/flux-patch.yaml | 30 ------------------------------
 1 file changed, 30 deletions(-)
2opremio commented 5 years ago

@nabadger I need to check the code (or an example). Do you have a simplified reproduction? Otherwise, can you show me the logs and the contents of .flux.yaml and prior contents of flux-patch.yaml?

guzmo commented 5 years ago

@2opremio I think fluxctl release command fails to push to the repository when we merge multiple services at the same time ( guess because it has to fix merges in the flux-patch.yaml? ). Is this something you have noticed? The problem might go away when the flux-patch has most of the services, but it's boring to see a lot of failed pipelines and rerun them because it might be this :)

2opremio commented 5 years ago

@guzmo No, I haven't seen that. I am more than happy to take a look but I need specifics of how to reproduce (ideally with a totally reproducible step by step example).

nabadger commented 5 years ago

@2opremio I haven't been been able to reproduce my earlier errors yet related to the deletion of flux-patch.yaml. The same command at the moment returns

Error: git commit: exit status 1
Run 'fluxctl policy --help' for usage.

Which is probably expected error handling of some sort?

nabadger commented 5 years ago

I have however found an issue with automation via weave annotations when using the kustomize commonAnnotations.

Let me know if I should keep raising issues in this case or separate ones.


I noticed that flux is not updating images based on how you define the annotations.

It might be useful to set automation policies in a base kustomization for a specific type of environment, like so (I'm fine not doing this, but others might like such a feature):

kustomization.yaml

commonAnnotations:
 flux.weave.works/automated: "true"
 flux.weave.works/tag.container-name: 'semver: ~0.0'

Whilst flux will think the workload is automated, it doesn't apply any new images. I set the various sync options to 1m and waited long enough for flux to apply any image updates it thinks it needs to.

WORKLOAD                          CONTAINER       IMAGE                                                  RELEASE  POLICY
default:deployment/flux           flux            docker.io/2opremio/flux:generators-releasers-bb344048  ready
default:deployment/memcached      memcached       memcached:1.4.25                                       ready
default:deployment/web-app-1-app  container-name  nabadger/podinfo:0.0.9                                 ready    automated

If i bump the image version for a demo app I get the following. It should jump to 0.0.11 from 0.0.9

WORKLOAD                          CONTAINER       IMAGE             CREATED
default:deployment/web-app-1-app  container-name  nabadger/podinfo
                                                  |   0.0.11        21 May 19 14:26 UTC
                                                  '-> 0.0.9         21 May 19 14:26 UTC
                                                      0.0.8         21 May 19 14:26 UTC
                                                      0.0.5         21 May 19 14:26 UTC
                                                      0.0.4         21 May 19 12:29 UTC

If however I add these annotations as a patch onto the deployment (and not using commonAnnotations) like so, it ends up working straight away:

kustomization.yaml

patches:
- deploy-patch.yaml

deploy-patch.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.container-name: 'semver: ~0.0'
  name: app

Could this be related to the image-poller checking annotations on non Deployment kinds? If we use commonAnnotations in this manner, then they get applied to all resource types (ingress, service, etc etc).

I see this warning in the flux logs which means it at least attempts to deal with this:

flux-5774f5cc85-srhvd flux ts=2019-05-21T15:14:22.41693528Z caller=images.go:18 component=sync-loop msg="polling images"
flux-5774f5cc85-srhvd flux ts=2019-05-21T15:14:22.696128944Z caller=images.go:34 component=sync-loop error="checking workloads for new images: Unsupported kind service"
flux-5774f5cc85-srhvd flux ts=2019-05-21T15:14:36.081429712Z caller=loop.go:111 component=sync-loop event=refreshed url=git@github.com:nabadger/flux-kustomize-example.git branch=test-policy-set HEAD=cd72ded861feb1b50f4b3fc148478dda6d9496fe
2opremio commented 5 years ago

@2opremio I haven't been been able to reproduce my earlier errors yet related to the deletion of flux-patch.yaml. The same command at the moment returns

Error: git commit: exit status 1
Run 'fluxctl policy --help' for usage.

Which is probably expected error handling of some sort?

It's hard to tell what the exact problem is, but please retry with Flux 1.12.3 (which includes #2054 ) to be released later today.

I will also update the image in #1848 to include it

2opremio commented 5 years ago

Whilst flux will think the workload is automated, it doesn't apply any new images. I set the various sync options to 1m and waited long enough for flux to apply any image updates it thinks it needs to.

That's strange, since Flux doesn't distinguish where the annotations come from

Could this be related to the image-poller checking annotations on non Deployment kinds? If we use commonAnnotations in this manner, then they get applied to all resource types (ingress, service, etc etc).

Ah, it could be. I don't think we should fail on that, I believe it should be a warning. However, this isn't directly related to the current issue. Could you create a separate ticket for it?

rdubya16 commented 5 years ago

@2opremio Ive been playing around with this build for the last week or so. Seems like it will suit our purposes. Ive ported most of our existing yaml over to use it in a sandbox environment and seems to be working well. I don't have any suggested changes just want to add my support for this implementation.

guzmo commented 5 years ago

Sorry for late reply. Got no time right now to create something reproducable I’m afraid :( only thing we did was merging two services at the same time so our build pipeline ran fluxctl release almost at the same time. So my guess is that the flux-patch.yaml had merge conflict when flux tried to push. Otherwise I’d say the branch works great and looking forward for it being merged :)

2opremio commented 5 years ago

Ah, it could be. I don't think we should fail on that, I believe it should be a warning. However, this isn't directly related to the current issue. Could you create a separate ticket for it?

@nabadger , @hiddeco beat you to it :) #2092

2opremio commented 5 years ago

@rdubya16 @guzmo Great that it's working well for you!

nabadger commented 5 years ago

@2opremio - just letting you know that this is still working well for us.

Something I noticed is that listing workloads and listing images via fluxctl seem to cause the flux pod to run a git clone of the repo and a kustomize build .

I was wondering if this is expected, since I thought maybe this information (at least for images) could be fetched from memcache instead.

The impact of this can slow response times for fluxctl. This is especially true when using remote bases in kustomize (since multiple git clones are required). This is something for other users to be aware of, but it's a problem of kustomize/git (not flux).

In our use-case fluxctl commands like listing images were taking upto 10 seconds. We've since switched to a monorepo for our kustomzie configuration, so now it's back down to around 1 second.

2opremio commented 5 years ago

Yeah, we haven't looked too deeply into performance yet. The first step is to release a functional and (at least reasonably) stable feature.Thanks for the report though.

@squaremo will be in charge of merging it toaster soon

squaremo commented 5 years ago

@squaremo will be in charge of merging it toaster soon

Very toaster!

2opremio commented 5 years ago

Hahahaha, I meant "to master", damn corrector.

squaremo commented 5 years ago

:confetti_ball: #1848 is merged! :tada: You can now use a flux image from the official pre-releases repo: https://hub.docker.com/r/weaveworks/flux-prerelease/tags, and set the --manifest-generation flag, to try this feature out.