fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.59k stars 608 forks source link

v0.6.1 fails to upgrade with kustomize build error json: unsupported type: map[interface {}]interface {} #729

Closed echel0n closed 3 years ago

echel0n commented 3 years ago

When attempting to upgrade from v0.5.3 to v0.6.1 I get the following error below: json: unsupported type: map[interface {}]interface {}

stefanprodan commented 3 years ago

How are we supposed to track this down? can you please post the bootstrap output, the full file tree of your repo and kustomize-controller logs.

echel0n commented 3 years ago

The file structure is fairly simple, cluster folder inside the root of the git repo and then inside that is flux-sytstem folder.

I do not have a bootstrap output to provide, but here is the command I run from a CI script to perform the update:

$ ./bin/flux install --arch=amd64 --export > ./cluster/flux-system/gotk-components.yaml
Flag --arch has been deprecated, multi-arch container image is now available for AMD64, ARMv7 and ARM64

kustomization log:

{"level":"info","ts":"2021-01-17T09:13:06.789Z","logger":"controller.kustomization","msg":"Reconciliation finished in 9.179319383s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","revision":"master/700af33f3e8b0452a3233a30697aab7b8429f30b"}
{"level":"error","ts":"2021-01-17T09:13:06.789Z","logger":"controller.kustomization","msg":"Reconciler error","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","error":"kustomize build failed: json: unsupported type: map[interface {}]interface {}"}

► checking prerequisites ✔ kubectl 1.20.1 >=1.18.0 ✔ Kubernetes 1.20.1 >=1.16.0 ► checking controllers ✔ source-controller is healthy ► ghcr.io/fluxcd/source-controller:v0.6.1 ✔ kustomize-controller is healthy ► ghcr.io/fluxcd/kustomize-controller:v0.6.2 ✔ helm-controller is healthy ► ghcr.io/fluxcd/helm-controller:v0.5.1 ✔ notification-controller is healthy ► ghcr.io/fluxcd/notification-controller:v0.6.1 ✔ all checks passed

stefanprodan commented 3 years ago

I guess it’s related to https://github.com/kubernetes-sigs/kustomize/issues/3446

echel0n commented 3 years ago

Is there something I can do on my end to use the latest version of kustomize or ?

stefanprodan commented 3 years ago

You could help out by identifying which file causes this, as this doesn't happen in our e2e tests and I have no way to replicate the kustomize bug.

echel0n commented 3 years ago

I'd be happy to if I even knew where to start on this, everything was fine on v0.5.3, none of my cluster files changed, so the upgrade to v0.6.1 should of went off without a hitch. Is there any way to have Flux display the filename when it throws the exception?

The problem is even with a fairly simple file structure I still have lots of helm releases and deployments, so ...

stefanprodan commented 3 years ago

Is there any way to have Flux display the filename when it throws the exception?

We use kustomize as a library and looks like kustomize build doesn't log anything useful, so we are stuck. See my comment here https://github.com/kubernetes-sigs/kustomize/issues/3446#issuecomment-761764629

echel0n commented 3 years ago

I suppose what I could do is just download kustomize binary and dry-run test against my cluster folder structure and see what pops, just surprised I'm the first person reporting this issue.

echel0n commented 3 years ago

OK, so I ran Kustomize v3.9.1 against flux-system folder as it's the only Kustomization I have present in my cluster repo and it passed no problem, so I wonder what their latest API version is actually built against ...

stefanprodan commented 3 years ago

Flux code does the equivalent of kustomize build --enable_kyaml=false we had to disable kyaml due to panic errors.

stefanprodan commented 3 years ago

@echel0n can you please run docker.io/stefanprodan/kustomize-controller:v0.6.3-test.1 on your cluster and see if it errors out?

echel0n commented 3 years ago

same issue with this image as well

{"level":"info","ts":"2021-01-17T17:10:23.114Z","logger":"controller.kustomization","msg":"Reconciliation finished in 4.437103353s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","revision":"master/a475e7fc73980b9d8b7992a7fd0e26ca6f35fdc9"}
{"level":"error","ts":"2021-01-17T17:10:23.115Z","logger":"controller.kustomization","msg":"Reconciler error","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","error":"kustomize build failed: json: unsupported type: map[interface {}]interface {}"}
stefanprodan commented 3 years ago

Ok, let's try the latest kustomize released today api/v0.7.2, here is the image:

docker.io/stefanprodan/kustomize-controller:v0.6.3-test.2
echel0n commented 3 years ago

testing now, will report back shortly

echel0n commented 3 years ago

{"level":"error","ts":"2021-01-17T20:46:52.028Z","logger":"controller.kustomization","msg":"Reconciler error","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"flux-system","namespace":"flux-system","error":"kustomize build failed: error marshaling into JSON: json: unsupported type: map[interface {}]interface {}"}

stefanprodan commented 3 years ago

Hmm do you have kustomization.yaml in your ./cluster or it’s autogenerated by Flux?

echel0n commented 3 years ago

I do not have any kustomization.yaml in my cluster, only using what Flux auto-generates, I was going to ask how does Flux auto-generate the file as what I have yet to test is creating a kustomization.yaml file that contains all my deployments and helm releases.

stefanprodan commented 3 years ago

Ok please do:

# generate kustomization.yaml
cd ./cluster
kustomize create --autodetect --recursive

# verify it's working
kustomize build --enable_kyaml=false . 

The above will generate the same kustomization.yaml as Flux. You can commit this file to your repo to have full control over what manifests are included.

echel0n commented 3 years ago

OK, so I tried what you asked on both Windows and Linux, no issues at all, so the bug must be present in the API, all I can think of, unless you have a way for me to test the API as well ?

echel0n commented 3 years ago

And typically I would just start removing deployments one by one till the issue is gone, but this is a production cluster so I don't want to uninstall anything.

echel0n commented 3 years ago

What's different between v0.5.3 and the next version after, cause I remember trying the version after that and got this same error.

echel0n commented 3 years ago

kustomize-controller:v0.6.0 fails kustomize-controller:v0.5.3 works

so whatever changed between those 2 versions is what is causing this, and from what I can see the only real change was the upgrade to API v0.7.1

stefanprodan commented 3 years ago

Have you committed the generated kustomization.yaml, does it error out the same?

echel0n commented 3 years ago

When you say committed, are you asking me to apply that to my live cluster? I'm hesitant to do that on this production cluster.

In the other issue thread, you referenced you can see the person produced the same error simply by running the build command, so I would have thought we would be able to reproduce it the same way.

Does Flux do any formatting of any kind prior to building the kustomization.yaml ?

stefanprodan commented 3 years ago

When you say committed, are you asking me to apply that to my live cluster?

No, you can’t apply a kustomize config, it’s not a custom resource. You should commit and push that file to your repository, for Flux to use it instead of generating one.

echel0n commented 3 years ago

ok, committed and pushed, will let you know the outcome shortly.

echel0n commented 3 years ago

same error as before using v0.6.2 controller

echel0n commented 3 years ago

going through and commenting out one by one till I get a pass, will let you know what comes of this

echel0n commented 3 years ago

so would you believe this is what caused the issue, switched from int to str for keys and all was fixed, int keys are how ingress-nginx actually specifies in their documentation, anyways thank you very much for all your hard efforts, I hope this helps you and others down the road!

tcp:
  32400: "plex/plex:32400"
stefanprodan commented 3 years ago

Reopening as this is not yet fixed.

andloh commented 2 years ago

I still get errors when trying to build this one:

kind: ConfigMap
 apiVersion: v1
 metadata:
   name: tcp-services
   namespace: test-ingress
   labels:
     app.kubernetes.io/name: ingress-nginx
     app.kubernetes.io/part-of: ingress-nginx
 data:
   3306: "test/mysql:3336"
Error: map[string]interface {}{"apiVersion":"v1", "data":map[interface {}]interface {}{3306:"test/mysql:3336"}, "kind":"ConfigMap", "metadata":map[string]interface {}{"labels":map[string]interface {}{"app.kubernetes.io/name":"ingress-nginx", "app.kubernetes.io/part-of":"ingress-nginx"}, "name":"tcp-services", "namespace":"test-ingress"}}: json: unsupported type: map[interface {}]interface {}

Did you manage to work trough this @echel0n ? You see, I have all my keys as ints.

stefanprodan commented 2 years ago

@echel0n you can't use integer keys in your YAMLs due to this upstream bug: https://github.com/kubernetes-sigs/kustomize/issues/3446

The only way to fix this is by making it a string:

kind: ConfigMap
 apiVersion: v1
 metadata:
   name: tcp-services
   namespace: test-ingress
   labels:
     app.kubernetes.io/name: ingress-nginx
     app.kubernetes.io/part-of: ingress-nginx
 data:
   "3306": "test/mysql:3336"
michael-odell commented 2 years ago

Thanks for having this conversation in the open here. I had the same problem (cryptic failures mentioning json: unsupported type: map[interface {}]interface {} and it was due to the same thing -- integer keys being supplied to the ingress-nginx helm chart.

I changed the keys to strings and this fixed it for me, too.