fluxcd / terraform-provider-flux

Terraform and OpenTofu provider for bootstrapping Flux

https://registry.terraform.io/providers/fluxcd/flux/latest

Apache License 2.0

367 stars 86 forks source link

Customizing document unclear or incorrect? #231

Closed nuriel77 closed 7 months ago

nuriel77 commented 2 years ago

Hi!

With reference to the doc

Question about the document, where it specifies this file and in the kustomization file it is supposed to generate there's a different filename.
Via the above method, it seems possible to add strategic merge, however, I'd like to configure something like command line arguments (e.g. --concurrent=6 etc), but that seems only possible with json patch:
```
- op: add
  path: /spec/template/spec/containers/0/args/0
  value: --concurrent=6
```

Is there a way to add patchesJson6902 with the current customization implementation?

cmharden commented 2 years ago

@kingdonb Assign to me, please

kingdonb commented 2 years ago

In reference to (2), and I'm not exactly sure if this helps or if it doesn't, it is possible to use patches directive in Flux Kustomization to apply a json6902 style patch, as in this example:

https://github.com/kingdonb/bootstrap-repo/blob/86f38e8c2457c5024f02be21653525c53ca1e5ba/clusters/gke-geekingdon/flux-system/kustomization.yaml#L6

@cmharden is looking into the docs issue 👍 thanks for your report!

nuriel77 commented 2 years ago

Thanks @kingdonb .

From my understanding, this isn't possible with the current implementation of the patch_names argument in the flux_sync data source.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-sync.yaml
- gotk-components.yaml
patchesStrategicMerge:  # This and the following line are added by the `flux_sync` data source by `patch_names`
- patch-sops.yaml

While I could manage the kustomize.yaml it doesn't seem to be the intention of the customize-flux document and implementation, as far as I understand it.

The kustomize.yaml content will be generated by the flux_sync data source to include patch files, using a patchesStrategicMerge.

In my opinion what could be a solution is to add a new attribute, e.g. patches

locals {
  #
  # Rename the previous patches to:
  #
  strategic_merge_patches = {
    psp  = file("./psp-patch.yaml")
    sops = <<EOT
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kustomize-controller
  namespace: flux-system
  annotations:
    iam.gke.io/gcp-service-account: ${data.google_service_account.flux_sops.email}
EOT
  }

  #
  # This one is new: 
  #
  patches = <<EOT
- target:
    group: apps
    version: v1
    kind: Deployment
    name: kustomize-controller
    namespace: flux-system
  patch: |-
    # this one is needed because it is adding an argument to an existing list of arguments
    - op: add
      path: /spec/template/spec/containers/0/args/0
      value: --concurrent=6
    # these are just examples, these could be configured via strategic merge
    - op: replace 
      path: /spec/template/spec/containers/0/resources/limits/cpu
      value: "2"
    - op: replace
      path: /spec/template/spec/containers/0/resources/limits/memory
      value: "2Gi"
EOT
}

...etc...

data "flux_sync" "main" {
  target_path   = var.target_path
  url                  = var.clone_url
  patch_names = keys(local.strategic_merge_patches)   # renaming of the local.patches
  patches          = local.patches                    # add a new attribute for patches
}

The result would be:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-sync.yaml
- gotk-components.yaml
patchesStrategicMerge:
- patch-sops.yaml
- patch-psp.yaml
patches:
- target:
    group: apps
    version: v1
    kind: Deployment
    name: kustomize-controller
    namespace: flux-system
  patch: |-
  - op: add
    path: /spec/template/spec/containers/0/args/0
    value: --concurrent=6
  - op: replace 
    path: /spec/template/spec/containers/0/resources/limits/cpu
    value: "2"
  - op: replace
    path: /spec/template/spec/containers/0/resources/limits/memory
    value: "2Gi"

kingdonb commented 2 years ago

I'm not sure if I was clear, the Flux kustomization has one patches directive, and you can use it to patch command line arguments into commands as if they were json6902 patches.

I don't know exactly how someone would invoke that directive in Terraform, but I've just gone back and re-read my example, and it does not do what I thought it did, so it's now clearer how you got the idea that I was suggesting something I didn't mean to. I think you're right that it isn't possible to manipulate the SIG-CLI kustomization.yaml during bootstrap. But you should be able to update the Flux kustomization for flux-system which has a similar directive you can use for patching.

I believe the Flux Kustomization's spec.patches can accept patches in all of the same forms as SIG-CLI's? I could be wrong about that, but I am fairly sure that it's at least as capable, though you might have to provide the patches in a slightly different form. Here's what I mean: https://fluxcd.io/docs/components/kustomize/kustomization/#patches

So if you can use Terraform provider to manipulate the Flux Kustomization spec with a patch, you can patch it to include this content (the patch that includes additional concurrency argument, so a patch for a patch) and get your command line arguments in line that way. (At least I think that should work...)

nuriel77 commented 2 years ago

Thanks @kingdonb . I think I see what you mean:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-sync.yaml
- gotk-components.yaml
patchesStrategicMerge:
#
# This would be a `kustomize.toolkit.fluxcd.io/v1beta2` with patches for any of the Deployments in `gotk-components.yaml`
- patch-gotk-components.yaml

I'd have to try it. Just on top my head, I wonder if this would work, for 2 reasons:

If this is the first ever installation of gotk-components.yaml the CRD Kustomiztion from kustomize.toolkit.fluxcd.io/v1beta2 is still unknown in the cluster. Chicken egg problem?
Subsequent runs of Terraform on resource "kubectl_manifest" ... may see that the Deployments from gotk-components.yaml have been modified on the server and try to revert (because the Kustomization patch would have taken place on the server).

kingdonb commented 2 years ago

the CRD Kustomiztion from kustomize.toolkit.fluxcd.io/v1beta2 is still unknown in the cluster. Chicken egg problem?

Since Flux 0.18, the Kustomize Controller does reconciliation in a single step. I don't think this was ever a chicken-egg problem with prior Flux versions because it would naturally cause problems in a lot of scenarios for bootstrap users, but:

https://github.com/fluxcd/kustomize-controller/blob/8d61ff76d327ec76381f5f7fe3e9f0b722d75cf7/controllers/kustomization_controller.go#L682-L687

Kustomization now does a server-side apply in two stages, so that validation will succeed when a new CRD is being used. 👍

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
...
patchesStrategicMerge:
...

This is not actually what I was thinking, but if it works it's even better. I was suggesting a patch that modifies flux-system Kustomization so that it has a patches directive with the json6902-structured content in a patches.[]patch, that adds the flags to source-controller or any other controllers. I think we were trying to avoid using patchesStrategicMerge because it doesn't make it very easy to add a single field to an array at any arbitrary YAML address like the JSON patch.

The point is though, that Flux's patches directive in the Flux Kustomization resource is meant to be a utility of last resort. If you can use the Kustomize overlay instead, it is more flexible and you should do it. 👍

Looks like you already found this resource: https://registry.terraform.io/providers/fluxcd/flux/latest/docs/guides/customize-flux

nuriel77 commented 2 years ago

@kingdonb thanks again.

I went ahead with your suggestion to patch the flux-system kustomization. It does apply the patches. However, and as I feared, each time I run Terraform, although I made no additional changes, I get the following:

Terraform will perform the following actions:

  # module.flux.kubectl_manifest.install["apps/v1/deployment/flux-system/helm-controller"] will be updated in-place
  ~ resource "kubectl_manifest" "install" {
        id                      = "/apis/apps/v1/namespaces/flux-system/deployments/helm-controller"
        name                    = "helm-controller"
      ~ yaml_incluster          = (sensitive value)
        # (12 unchanged attributes hidden)
    }

  # module.flux.kubectl_manifest.install["apps/v1/deployment/flux-system/kustomize-controller"] will be updated in-place
  ~ resource "kubectl_manifest" "install" {
        id                      = "/apis/apps/v1/namespaces/flux-system/deployments/kustomize-controller"
        name                    = "kustomize-controller"
      ~ yaml_incluster          = (sensitive value)
        # (12 unchanged attributes hidden)
    }

  # module.flux.kubectl_manifest.install["apps/v1/deployment/flux-system/source-controller"] will be updated in-place
  ~ resource "kubectl_manifest" "install" {
        id                      = "/apis/apps/v1/namespaces/flux-system/deployments/source-controller"
        name                    = "source-controller"
      ~ yaml_incluster          = (sensitive value)
        # (12 unchanged attributes hidden)
    }

So, basically it reverts it back to the original specs. The pods start a rolling upgrade. Then a few minutes later the patch on the flux-system kustomization is picked up and applied and thus another rolling upgrade.

This is the situation I thought would happen, hence why my original idea was to add patches on the kustomize.config.k8s.io/v1beta1 :

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-sync.yaml
- gotk-components.yaml
patchesStrategicMerge:
- patch-sops.yaml
- patch-psp.yaml
patches:
- target:
    group: apps
    version: v1
    kind: Deployment
    name: kustomize-controller
    namespace: flux-system
  patch: |-
  - op: add
    path: /spec/template/spec/containers/0/args/0
    value: --concurrent=6
  - op: replace 
    path: /spec/template/spec/containers/0/resources/limits/cpu
    value: "2"
  - op: replace
    path: /spec/template/spec/containers/0/resources/limits/memory
    value: "2Gi"

cmharden commented 2 years ago

Hi, @nuriel77 As a workaround you should be able to implement a JSON6902 patch using kustomize.toolkit.fluxcd.io/v1beta2 like so

apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: flux-system
  namespace: flux-system
spec:
  patches:
  - target:
      version: v1
      group: apps
      kind: Deployment
      name: kustomize-controller
      namespace: flux-system
    patch: |-
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --requeue-dependency=15s
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --concurrent=6
      - op: replace
        path: /spec/template/spec/containers/0/resources/limits/cpu
        value: "2"
      - op: replace
        path: /spec/template/spec/containers/0/resources/limits/memory
        value: "2Gi"

kingdonb commented 2 years ago

We came up with this "great thing,"™️ I honestly don't really like it but it's a work-around for the current limitation, which I think I've understood from terraform provider is that users can only customize it through individual strategic merge patches:

https://github.com/kingdonb/bootstrap-repo/commit/3661cdf24071c500662c216940a2a4a634e7e826

So we wanted to append a single arg to an array of args in the container, so we really can't use a strategic merge patch for that, since there is no appending to arbitrary arrays, (of course as it is a strategic merge, you can append to the containers array, adding more containers, but that is no arbitrary array, and there is no such facility for appending args that I know of.)

If anyone knows a way to append to a list of args adding new elements directly into an existing array with a strategicMergePatch, please "lay it on me," but... instead! we wrote a JSON6902 patch to apply, since appending to arrays there is straightforward.

But due to the Terraform limitation, we wrote it as a JSON6902 patch, in the form of a Strategic Merge Patch, to be applied through the Flux Kustomization patches directive. This is clearly one of those "last resort" applications of patches.

I'm afraid this is not exactly clear from the snip @cmharden posted above, but it's the same example, my link is just presented in a slightly different context, not in Terraform, but perhaps more clearly showing that the resource above is meant to be used as a Strategic Merge Patch. Hope this helped!

nuriel77 commented 2 years ago

@kingdonb @cmharden first I'd like to thank you for your responses and taking the time to follow up on this thread.

@cmharden I have been using exactly that spec you've shared.

I think I have failed to explain the problem with this approach, and would like to try to explain it here again, and why I previously suggested that we do the JSON6902 patch in the kustomization.yaml in addition to the existing strategic merge patch customization.

The steps

Add the patch to terraform (exactly the one you've shared @cmharden although I've patched helm-controller but that doesn't matter)
Run terraform plan and apply - the files get applied in github, consequently the helm-controller deployment it patched and the pod is upgraded to use the new spec.

So far all good, however:

Now if I run terraform plan again, the plan shows that it wants to update the deployment spec of helm-controller. This is not a desired state because it doesn't adhere to being idempotent and will result in a rolling upgrade.
I run terraform apply, the helm-controller deployment is reverted to its original state. Then a few seconds later the kustomization patch kicks in and applies itself. This results in another rolling upgrade.

Some output

We start with the original manifest deployed, this is taken from the running pod's spec (kubectl get po -l app=helm-controller -oyaml):

  spec:
    containers:
    - args:
      - --events-addr=http://notification-controller.flux-system.svc.cluster.local/
      - --watch-all-namespaces=true
      - --log-level=info
      - --log-encoding=json
      - --enable-leader-election

The first-ever plan with the patch @cmharden shared (albeit for helm-controller, not kustomization controller):

Terraform will perform the following actions:

  # module.flux.github_repository_file.kustomize will be updated in-place
  ~ resource "github_repository_file" "kustomize" {
      ~ content             = <<-EOT

            apiVersion: kustomize.config.k8s.io/v1beta1
            kind: Kustomization
            resources:
            - gotk-sync.yaml
            - gotk-components.yaml
          + patchesStrategicMerge:
          + - patch-main.yaml
        EOT
        id                  = "gitops/clusters/dev-beta/flux-system/kustomization.yaml"
      ~ overwrite_on_create = false -> true
        # (8 unchanged attributes hidden)
    }

  # module.flux.github_repository_file.patches["main"] will be created
  + resource "github_repository_file" "patches" {
      + branch              = "dev-beta"
      + commit_author       = (known after apply)
      + commit_email        = (known after apply)
      + commit_message      = (known after apply)
      + commit_sha          = (known after apply)
      + content             = <<-EOT
            ---
            apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
            kind: Kustomization
            metadata:
              name: flux-system
              namespace: flux-system
            spec:
              patches:
                - patch: |-
                    - op: add
                      path: /spec/template/spec/containers/0/args/0
                      value: --concurrent=6
                    - op: replace
                      path: /spec/template/spec/containers/0/resources/limits/cpu
                      value: "2"
                    - op: replace
                      path: /spec/template/spec/containers/0/resources/limits/memory
                      value: "2Gi"
                  target:
                    group: apps
                    version: v1
                    kind: Deployment
                    name: helm-controller
                    namespace: flux-system
        EOT
      + file                = "clusters/dev-beta/flux-system/patch-main.yaml"
      + id                  = (known after apply)
      + overwrite_on_create = false
      + repository          = "gitops"
      + sha                 = (known after apply)
    }

Plan: 1 to add, 1 to change, 0 to destroy.

We see the change being applied:

NAME                                       READY   STATUS              RESTARTS   AGE
helm-controller-5648bfdd55-bxrf7           0/1     ContainerCreating   0          2s
helm-controller-f76d67496-zt25t            1/1     Running             0          4d15h
kustomize-controller-7977b87f7b-5nhxf      1/1     Running             0          4d15h
notification-controller-656cdc97bc-6x7kj   1/1     Running             0          4d15h
source-controller-68d4f75576-844vk         1/1     Running             0          4d15h

Change is in the arguments on the new pod:

    - args:
      - --concurrent=6   ### \o/
      - --events-addr=http://notification-controller.flux-system.svc.cluster.local/
      - --watch-all-namespaces=true
      - --log-level=info
      - --log-encoding=json
      - --enable-leader-election

So far so good.

But, now if I run terraform plan again it wants to update the deployment spec for helm-controller:

Terraform will perform the following actions:

  # module.flux.kubectl_manifest.install["apps/v1/deployment/flux-system/helm-controller"] will be updated in-place
  ~ resource "kubectl_manifest" "install" {
        id                      = "/apis/apps/v1/namespaces/flux-system/deployments/helm-controller"
        name                    = "helm-controller"
      ~ yaml_incluster          = (sensitive value)
        # (12 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

This is because the module.flux.kubectl_manifest.install is trying to apply its contents, it has not idea about the kustomization patch of the flux-system kustomization.

GaruGaru commented 2 years ago

We are experiencing the same issue as @nuriel77 , the provider creates kubectl_manifest resources, which are not applying the provided customization

cmharden commented 2 years ago

Thank you, @nuriel77, for the thorough explanation that definitely helps.

@kingdonb It seems like a defect to me, the example uses patches as a variable name, implying that patches: will be used rather than assuming that all patches are a strategic merge. I wasn't able to find a workaround to patch an unstructured array using the current design.

As @nuriel77 mentioned, an option for fixing the issue would be to add a patchJson6902 as an additional resource.

Alternatively, the patches: field could be implemented as described here, rather than explicitly defining the patch type as patchStrategicMerge or patchJson6902

Imho, the former seems like an easier fix and the later more complex, but potentially a cleaner approach.

andreimahu commented 2 years ago

Hello and thank you for bringing this up. My use case is simply to enable the AWS ECR auto login as described here.

Since it is container args that need to be edited the patchStrategicMerge will not work.

balonik commented 2 years ago

Issue itself

I have achieved it by creating ./clusters/cluster1/kustomization.yaml file with content:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - flux-system
  - my_other_manifests
  - some_other_manifest.yaml
patches:
  # enable Native AWS ECR Auto-Login https://fluxcd.io/docs/guides/image-update/#using-native-aws-ecr-auto-login
  - patch: |-
      - op: add
        path: /spec/template/spec/containers/0/args/0
        value: --aws-autologin-for-ecr
    target:
      kind: Deployment
      name: "image-reflector-controller"

If you create such file in TF or create it manually it's up to you. This won't fix TF reporting changes in plans, but at least it's much easier to manage, in my opinion.

To avoid the TF reporting changes I have used the ignore_fields parameter in the kubectl_manifest resource to ignore the changes done by Kustomize.

resource "kubectl_manifest" "install" {
  for_each   = { for v in local.install : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }
  depends_on = [kubernetes_namespace.flux_system]
  yaml_body  = each.value
  ignore_fields = ["spec.template.spec.containers.0.args"]
}

It's not an ideal solution, but it works for now.

Some thoughts

Adding anything to flux_sync resource won't matter, because the manifests come from flux_install resource. flux_install generates the YAML manifests without any changes/patches applied to them, and such original manifests are consumed by kubectl_manifest resource. The changes will be reported in TF plan until you provide patched manifests to kubectl_manifest so it won't try to update/revert it. I have never used the flux_sync's patch_names param, but it seems to me the entire idea is flawed. Specifying it in flux_sync will cause double roll-out, once by kubectl_manifest using original manifests, and couple seconds later by Flux itself using Kustomize no matter what I do. In order to really provide a solution to customize Flux deployments the flux_install needs to provide ready-to-go manifests to kubectl_manifest. Have I missed something here?

nuriel77 commented 2 years ago

@balonik Hi.

I am glad there is interest in this issue and that also others are facing the same challenges. I hope it will help to expedite a solution.

It's not an ideal solution, but it works for now.

Thanks for sharing.

Adding anything to flux_sync resource won't matter, because the manifests come from flux_install resource.

It currently works (the patching), I am not sure what do you mean that it doesn't matter? Only problem is that it results in the rolling upgrade as mentioned in this issue.

From what I understand the original implementation of the customization in this module expects the patch_names attribute to be part of flux_sync. That is because flux_sync is the one that manages the target file kustomization.yaml By default it looks like this:


apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-sync.yaml
- gotk-components.yaml

The patching has to be done in this file, not in gotk-components.yam itself (which, as you said, is managed by flux_install).

If you check terraform state and search for the following and for kustomization.yaml I believe it will make more sense:

   {
      "module": "module.flux",
      "mode": "data",
      "type": "flux_sync",
      "name": "main",
      "provider": "provider[\"registry.terraform.io/fluxcd/flux\"]",
      "instances": [
      ...

balonik commented 2 years ago

My problem is with the approach of setting it in flux_sync. There are major differences between the Flux CLI and TF provider. Flux CLI:

it is not expected that you run Flux bootstrap via Flux CLI every day
Flux CLI detects existing version and recognizes an upgrade, changing only flux-system/gotk-components.yaml in GIT
no changes to other base files and no apply to cluster during upgrade. It lets Flux to update itself

TF provider:

TF is often part of CI/CD pipelines that can run every day/hour/minute
following TF provider guides we don't detect existing setup and do a full install/bootstrap every apply
every base file is regenerated and on a change pushed to GIT every apply
Flux manifests are on a change applied to cluster via kubectl every apply

These are the major reasons why I think following the Flux recommendations for customization cannot be applied in this TF provider.

I will explain using simple example. There are basically 2 main stages, the install and sync. In the install stage we generate Flux manifests using flux_install and apply them to k8s cluster using kubectl_manifest. When flux_install generates manifests it defines the state of the Flux deployment to be green. And as such (green) is deployed to the cluster. In the following sync stage we apply customizations and indirectly define that the deployment should be red. Flux will pickup this definition and apply it to the cluster. Now the state in the cluster is red. With this approach we have never ending changes, because we define state of a single resource in 2 places. Each stage defines it's own desired state. That's just wrong.

In ideal situation, the manifests applied to the cluster should already be red, so Flux won't do any changes on reconciliation.

The two rollouts are fine for a short reconciliation interval, but let's say I have 15 minutes interval. That means Flux is running in broken state for 15 minutes in worst case scenario. Everytime TF apply runs.

nuriel77 commented 2 years ago

@balonik I see what you mean

nuriel77 commented 2 years ago

Fwiw, I believe it won't even matter if there was a single stage of kubectl_manifest install. Ideally, a kustomize build would have to be done in terraform and then kubectl_manifest to apply the result. That way terraform will not see a diff in subsequent runs. I had a quick look at kustomize provider but couldn't understand how to get that into the mix.

If anybody agrees with the direction of performing the kustomize build in terraform and then applying the resultant yaml to the cluster, and happen to think of a solution how to get it done, please do share.

balonik commented 2 years ago

I have found that provider as well and even raised FR to support our use-case https://github.com/kbst/terraform-provider-kustomization/issues/163, where I got pointed to another issue from 2021 in which they declined the idea. But it should be still doable with local_file, where you can save the generated manifests from flux_install and flux_sync as files during plan/apply execution, have a kustomization.yaml as file in repository and use them with kustomize_build to generate final manifests for kubectl_manifest. I am not a fan of this approach, because I ran TF in ephemeral k8s pods, so any generated local files would be as change in each TF plan/apply. But if you are running it from (as example) workstation it could work.

swade1987 commented 7 months ago

Hi @nuriel77 👋

I hope you are doing well! As part of our ongoing effort to maintain and improve the quality of our project, I've been reviewing open issues and came across the one you've reported. First off, thank you for taking the time to contribute by reporting this issue; your input is crucial to us.

Upon reviewing the details of your issue, I noticed that it involves the use of a resource or feature that has yet to be supported since the 1.0.0 release of our project, which was approximately 9 months ago. This might be a key factor in the challenges you're experiencing. We understand that changes and deprecations can impact your work, and we're here to help navigate these transitions. If there are specific reasons you've continued using this unsupported resource or if there's any way we can assist in migrating to a supported alternative, please let us know.

Additionally, to ensure the efficient management of our issue tracker and to focus on issues that are actively affecting our community, we have implemented a policy for issues that remain inactive. If there is no activity on this issue within the next 3 weeks, we will consider the issue inactive and close it for you. This doesn't mean your issue is not important to us, but rather that we aim to keep our focus on actively pursued concerns. Of course, if the issue continues or if you have further updates in the future, feel free to reopen the issue or create a new one.

Thank you once again for your contribution to our project. Your feedback not only helps us improve but also supports the broader community in overcoming similar challenges. We look forward to hearing from you and hope to resolve any outstanding concerns together.

Best regards,

Steve

To customize the flux_bootstrap_git resource please take a look at the following example: