carvel-dev / kapp-controller

Continuous delivery and package management for Kubernetes.
https://carvel.dev/kapp-controller
Apache License 2.0
269 stars 105 forks source link

debugging helm chart errors on app reconcile #589

Closed databasedav closed 2 years ago

databasedav commented 2 years ago

i'm having trouble installing the aerospike helm chart even though the manual helm template ... output looks fine

here's repro instructions on Ubuntu 21.10 using kind version 0.12.0

kind create cluster
kapp deploy -a kapp-controller -f https://github.com/vmware-tanzu/carvel-kapp-controller/releases/latest/download/release.yml -y
cat <<EOF | kapp deploy -a namespace -f- -c -y
apiVersion: v1
kind: Namespace
metadata:
  name: aerospike
EOF
cat <<EOF | kapp deploy -a rbac -f- -c -y
apiVersion: v1
kind: ServiceAccount
metadata:
  name: aerospike-ns-sa
  namespace: aerospike
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aerospike-ns-role
  namespace: aerospike
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aerospike-ns-role-binding
  namespace: aerospike
subjects:
- kind: ServiceAccount
  name: aerospike-ns-sa
  namespace: aerospike
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: aerospike-ns-role
EOF
cat <<EOF | kapp deploy -a app -f- -c -y
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  name: aerospike
  namespace: aerospike
spec:
  serviceAccountName: aerospike-ns-sa
  syncPeriod: 24h
  fetch:
  - helmChart:
      name: aerospike
      version: 5.5.0
      repository:
        url: https://aerospike.github.io/aerospike-kubernetes/
  template:
  - helmTemplate: {}
  deploy:
  - kapp: {}
EOF

from the last command i get this output

kapp: Error: waiting on reconcile app/aerospike (kappctrl.k14s.io/v1alpha1) namespace: aerospike:
  Finished unsuccessfully (Reconcile failed:  (message: Deploying: Error (see .status.usefulErrorMessage for details)))

and here's the output of kapp inspect -a app --status

Target cluster 'https://127.0.0.1:42159' (nodes: kind-control-plane)

Resources in app 'app'

Namespace  aerospike  
Name       aerospike  
Kind       App  
Status     conditions:  
           - message: 'Deploying: Error (see .status.usefulErrorMessage for details)'  
             status: "True"  
             type: ReconcileFailed  
           consecutiveReconcileFailures: 5  
           deploy:  
             error: 'Deploying: Error (see .status.usefulErrorMessage for details)'  
             exitCode: 1  
             finished: true  
             startedAt: "2022-03-28T06:47:10Z"  
             stderr: 'kapp: Error: invalid Yaml document separator: apiVersion: v1'  
             stdout: Target cluster 'https://10.96.0.1:443'  
             updatedAt: "2022-03-28T06:47:10Z"  
           fetch:  
             exitCode: 0  
             startedAt: "2022-03-28T06:47:10Z"  
             stdout: |  
               apiVersion: vendir.k14s.io/v1alpha1  
               directories:  
               - contents:  
                 - helmChart:  
                     appVersion: 5.5.0.7  
                     version: 5.5.0  
                   path: .  
                 path: "0"  
               kind: LockConfig  
             updatedAt: "2022-03-28T06:47:10Z"  
           friendlyDescription: 'Reconcile failed: Deploying: Error (see .status.usefulErrorMessage  
             for details)'  
           inspect:  
             exitCode: 0  
             stdout: |-  
               Target cluster 'https://10.96.0.1:443'  
               Resources in app 'aerospike-ctrl'  
               Namespace  Name  Kind  Owner  Conds.  Rs  Ri  Age  
               Rs: Reconcile state  
               Ri: Reconcile information  
               0 resources  
               Succeeded  
             updatedAt: "2022-03-28T06:47:10Z"  
           observedGeneration: 2  
           template:  
             exitCode: 0  
             updatedAt: "2022-03-28T06:47:10Z"  
           usefulErrorMessage: 'kapp: Error: invalid Yaml document separator: apiVersion: v1'  

1 resources

Succeeded

any tips for how to debug this? thanks :)

cppforlife commented 2 years ago

my general suggestion would be to try to use tools directly to replicate the steps (we have plans to make this easier via kctrl but havent gotten there yet). another idea would be to throw in ytt: {} step after helm template to see if you get a nicer error message.

databasedav commented 2 years ago

@cppforlife thanks for the suggestion, adding ytt: {} has given me some extra clues (and i was getting a similar error when attempting to install kind ingress nginx with kapp controller)

usefulErrorMessage: |-  
             kapp: Error: Validation errors:  
             - Expected 'apiVersion' on resource 'serviceaccount/aerospike-aerospike () namespace: aerospike' to be non-empty (stdin doc 2)  
             - Expected 'apiVersion' on resource 'clusterrole/aerospike-aerospike () cluster' to be non-empty (stdin doc 3)  
             - Expected 'apiVersion' on resource 'clusterrolebinding/aerospike-aerospike () cluster' to be non-empty (stdin doc 4)

the templates are here for reference https://github.com/aerospike/aerospike-kubernetes/tree/master/helm/templates

any idea what is causing this to happen? doesn't this suggest that the apiVersion line in the template is getting eaten by something? again i just want to note that the manual helm template ... output is fine and as expected

databasedav commented 2 years ago

ok so after inspecting each file it looks like the problem is the double - in the initial {- ... -} in the template. Out of all the template files, only the ones corresponding to those in the usefulErrorMessage have this, which seems to be eating the apiVersion line at some point, but this does not happen with a helm template a aerospike/aerospike (<- should be repro)

@cppforlife could this be a bug somewhere in the carvel stack?

databasedav commented 2 years ago

i've confirmed that the double - was the problem with a fork of the aerospike helm chart

cat <<EOF | kapp deploy -a app -f- -c -y
apiVersion: v1
kind: ConfigMap
metadata:
  name: aerospike-helm-values
  namespace: aerospike
data:
  values.yaml: |+
    rbac:
      create: false
---
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  name: aerospike
  namespace: aerospike
spec:
  serviceAccountName: aerospike-ns-sa
  syncPeriod: 24h
  fetch:
  - git:
      url: https://github.com/databasedav/aerospike-kubernetes
      ref: origin/master
      subPath: helm
  template:
  - helmTemplate:
      valuesFrom:
      - configMapRef:
          name: aerospike-helm-values
  deploy:
  - kapp: {}
EOF
databasedav commented 2 years ago

also something similar is happening with https://github.com/kubernetes/ingress-nginx/blob/main/deploy/static/provider/kind/1.23/deploy.yaml

usefulErrorMessage: |-  
             kapp: Error: Validation errors:  
             - Expected 'kind' on resource '/ () cluster' to be non-empty (stdin doc 20)  
             - Expected 'apiVersion' on resource '/ () cluster' to be non-empty (stdin doc 20)  
             - Expected 'metadata.name' on resource '/ () cluster' to be non-empty (stdin doc 20)

but this one isn't a template, it's just yaml ...

benmoss commented 2 years ago

This looks like the Aerospike Helm chart problem is being caused by it using a v1 Chart, which kapp-controller decides to use helmv2 for.

When you template it with Helm v3, it works fine, but with Helm v2 it seems to get quite broken.

❯ helmv2 template ./ -name foo --namespace aerospike  | kapp deploy -a aerospike -f-
Target cluster 'https://127.0.0.1:34819' (nodes: kind-control-plane)

kapp: Error: invalid Yaml document separator: apiVersion: v1
❯ helm template foo ./ --namespace aerospike --include-crds  | kapp deploy -a aerospike -f- -y
Target cluster 'https://127.0.0.1:34819' (nodes: kind-control-plane)

Changes

Namespace  Name           Kind                Conds.  Age  Op      Op st.  Wait to    Rs  Ri
(cluster)  foo-aerospike  ClusterRole         -       -    create  -       reconcile  -   -
^          foo-aerospike  ClusterRoleBinding  -       -    create  -       reconcile  -   -
aerospike  foo-aerospike  Service             -       -    create  -       reconcile  -   -
^          foo-aerospike  ServiceAccount      -       -    create  -       reconcile  -   -
^          foo-aerospike  StatefulSet         -       -    create  -       reconcile  -   -
^          foo-conf       ConfigMap           -       -    create  -       reconcile  -   -

Op:      6 create, 0 delete, 0 update, 0 noop, 0 exists
Wait to: 6 reconcile, 0 delete, 0 noop

10:03:38AM: ---- applying 3 changes [0/6 done] ----
10:03:38AM: create clusterrole/foo-aerospike (rbac.authorization.k8s.io/v1) cluster
10:03:38AM: create configmap/foo-conf (v1) namespace: aerospike
10:03:39AM: create serviceaccount/foo-aerospike (v1) namespace: aerospike
10:03:39AM: ---- waiting on 3 changes [0/6 done] ----
...

It's not just a kapp problem, kubectl also fails with helmv2 templating that chart:

❯ helmv2 template ./ -name foo --namespace aerospike  | kubectl apply -f-
configmap/ame-conf created
error: error validating "STDIN": error validating data: apiVersion not set; if you choose to ignore these errors, turn validation off with --validate=false

I'm not sure who is to blame here, the chart does support Helm v2 still, but the helmv2 template seems broken with it.

databasedav commented 2 years ago

@benmoss thanks for digging into this, is there any way to force kapp-controller to use helm3? can i overlay the Chart.yaml with ytt to replace the version or can ytt only hit the templates? also this explains the templating issue but what about for https://github.com/vmware-tanzu/carvel-kapp-controller/issues/589#issuecomment-1082748152? here's the corresponding app manifest

apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
spec:
  serviceAccountName: ingress-nginx-ns-sa
  syncPeriod: 24h
  fetch:
  - git:
      url: https://github.com/kubernetes/ingress-nginx
      ref: origin/main
      subPath: deploy/static/provider/kind/1.23
  template:
  - ytt: {}
  deploy:
  - kapp: {}
benmoss commented 2 years ago

is there any way to force kapp-controller to use helm3?

I don't think so, but in the latest release helm v2 has been removed.

I'll look into that ingress-nginx issue

benmoss commented 2 years ago

It looks like your problem there is that kustomization.yaml file is not a valid resource.

I tried using subPath: deploy/static/provider/kind/1.23/deploy.yaml but it seems that subPath only works with directories.

This seems to work, though still doesn't come up on my kind cluster (the ingress-nginx-controller pod says 1 node(s) didn't match Pod's node affinity/selector.)

apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  name: ingress-nginx
spec:
  serviceAccountName: cluster-admin-sa
  syncPeriod: 24h
  fetch:
  - http:
      url: https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/1.23/deploy.yaml
  template:
  - ytt: {}
  deploy:
  - kapp: {}
cppforlife commented 2 years ago

closing due to no activity. feel free to reopen.