fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.34k stars 591 forks source link

Nested substitution fails when using an integer #3128

Open chaospuppy opened 1 year ago

chaospuppy commented 1 year ago

Describe the bug

Problem Overview

Using an integer as a postBuild.substitute value that is used to define a value in a child Kustomization's postBuild.substitute results in the following error:

✗Kustomization reconciliation failed: Kustomization/flux-system/myapp dry-run failed, error: failed to create typed patch object: .spec.postBuild.substitute.aws_account_id: expected string, got &value.valueUnstructured{Value:111111111111}

Detailed Description

Our use case for Flux includes using a single, top-level Kustomization resources to provision other Kustomizations. In the top level Kustomization, when are using spec.postBuild.substitute to keep child resources generic, like so:

# Top-level Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: bootstrap
spec:
  interval: 1m0s
  sourceRef:
    kind: GitRepository
    name: bootstrap
  path: "./umbrella/envs/staging"
  postBuild:
    substitute:
      aws_account_id: "111111111111"
      cluster_env: "staging"
# Child Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: myapp-kustomization
spec:
  interval: 1m0s
  sourceRef:
    kind: GitRepository
    name: bootstrap
  path: "./apps/myapp/envs/${cluster_env}/"
  prune: true
  postBuild:
    substitute:
      cluster_env: ${cluster_env}
      aws_account_id: ${aws_account_id} # Adding this causes the bug

This double-substitution works perfectly fine in the above example for defining the value of cluster_env once in the top-level substitute block, and it correctly renders the child Kustomization as

# Child Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: myapp-kustomization
spec:
  interval: 1m0s
  sourceRef:
    kind: GitRepository
    name: bootstrap
  path: "./apps/myapp/envs/staging/"
  prune: true
  postBuild:
    substitute:
      cluster_env: staging

However, once we include the aws_account_id: 111111111111 substitute element, we get the error pasted above indicating an incompatible type being passed to the child Kustomization.

Steps to reproduce

  1. Create a Kustomization that references a Gitrepository as a source, defining a postBuild.substitute attribute whose value is an integer
  2. Within the source Gitrepository, define an additional Kustomization that also has a postBuild.substitute map that has an element referencing the variable to be substituted by the top-level Kustomization as a value
  3. Reconcile the top level Kustomization to observe the error.

Expected behavior

The nested (child) Kustomization receives the value passed in at the top-level Kustomization and uses it to find and replace instances of aws_account_id in the resources it applies.

Screenshots and recordings

No response

OS / Distro

N/A

Flux version

v0.31.5

Flux check

► checking prerequisites ✗ flux 0.31.5 <0.34.0 (new version is available, please upgrade) ✔ Kubernetes 1.22.12-eks-6d3986b >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► registry1.dso.mil/ironbank/fluxcd/helm-controller:v0.22.2 ✔ kustomize-controller: deployment ready ► registry1.dso.mil/ironbank/fluxcd/kustomize-controller:v0.26.3 ✔ notification-controller: deployment ready ► registry1.dso.mil/ironbank/fluxcd/notification-controller:v0.24.1 ✔ source-controller: deployment ready ► registry1.dso.mil/ironbank/fluxcd/source-controller:v0.25.11 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta1 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta1 ✔ receivers.notification.toolkit.fluxcd.io/v1beta1 ✔ all checks passed

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

tvalchev2 commented 1 year ago

Hey, I've also stumbled up upon this, while nesting Kustomizations and passing values with substitutes. I only found a workaround, which works perfectly fine, but it is a bit ugly to take care of. The Kustomization only accepts strings as values and cries out loud, when it gets an Integer instead of a string (or boolean for that matter).

The workaround is to escape a pair of "" in the parent kustomization, so that the child kustomization becomes the Integer value as a string. For example:

# Top-level Kustomization
....
  postBuild:
    substitute:
      aws_account_id: "\"111111111111\""
      cluster_env: "staging"

That way your Child Kustomization will get the value passed down as a String and will work.

jawnsy commented 1 year ago

A similar issue happens when you define a numeric postBuild value as a floating-point number. In our case, we're using postBuild variables for version numbers. Some container image authors publish using major.minor.patch semvers and may also use major.minor sometimes (e.g. 2.4, 2.4.1). If the version is specified in postBuild as 2.4.1, then the YAML parser interprets it as a string, and everything is well. However, if the version is specified in postBuild as 2.4, then the YAML parser interprets it as a float, and this causes an error like:

kustomization/cluster-deployment        main@sha1:844bdc07                              False           False   Kustomization/flux-system/debezium dry-run failed: failed to create typed patch object (flux-system/debezium; kustomize.toolkit.fluxcd.io/v1, Kind=Kustomization): .spec.postBuild.substitute.debezium_server_version: expected string, got &value.valueUnstructured{Value:2.4}

The workaround is to always ensure that the version is escaped (e.g. written as debezium_server_version: "2.4" or debezium_server_version: "2.4.0"). However, in our case, we're using updatecli so do not have direct control over how the version is written; I've opened a related bug for that https://github.com/updatecli/updatecli/issues/1416

tvalchev2 commented 1 year ago

It is even more tedious then that if you have top-level Kustomization passing values down via postBuild to a children Kustomization (which needs to have a default value defined for the variable), which then passes those values down via postBuild to another Repository, where the values get used for templating different stuff. I have not found a way ho to preserve a value like a String to pass it down to the consumer (in my case it is an environment variable in a CronJob, where the variable needs to be a string per definition (even if the string is 7). Example: Top Level Kustomization:

postBuild:
  substitute:
    backup_retention_days: '\"4\"'

Child Kustomization:

postBuild:
  substitute:
    backup_retention_days: ${solr_backup_retention_days:='"7"'}

Actual consumer of the backup_retention_days variable

apiVersion: batch/v1
kind: CronJob
metadata:
  name: clean-backups
  namespace: ${instance_name}
spec:
  jobTemplate:
    spec:
      template:
            env:
              - name: BACKUP_RETENTION_DAYS
                value: ${backup_retention_days//\\/}

This is the only way I got it to work, but I had to create and edit the CronJob myself since the default one that came with the image wouldn't be templated properly by the variable substitution, because spec.jobTemplate.spec.template.env.value requires a String, but it gets an Integer from the Child Kustomization, so i have to work with string substitution to remove the backslashes from the string, otherwise the value gets passed down as Integer.

starlightromero commented 4 months ago

Would appreciate seeing this resolved as I have run into this same issue years later. Has there been any updates?

It becomes even more of a headache when you have some substitutions being used at the top level and some being passed down to sub-levels. Top-level

...
  postBuild:
    substitute:
      aws_account_id: "111111111111"
      aws_account_id_nested: "\"111111111111\""

Sub-level

...
  postBuild:
    substitute:
      aws_account_id: ${aws_account_id_nested}
      cluster_name: ${cluster_name}
stefanprodan commented 4 months ago

@starlightromero please try https://github.com/fluxcd/kustomize-controller/blob/main/docs/spec/v1/kustomizations.md#post-build-substitution-of-numbers-and-booleans