cetic / helm-nifi

Helm Chart for Apache Nifi
Apache License 2.0
215 stars 225 forks source link

[cetic/nifi] cert-manager not correctly generating the ca.cert #261

Closed lfreinag closed 2 years ago

lfreinag commented 2 years ago

Describe the bug When trying the new cert-manager I get the following error.

/opt/nifi/nifi-current/tls/truststore.jks is not readable! Waiting for cert-manager sidecar to populate it.

Version of Helm, Kubernetes and the Nifi chart:

helm version
version.BuildInfo{Version:"v3.9.0", GitCommit:"7ceeda6c585217a19a1131663d8cd1f7d641b2a7", GitTreeState:"clean", GoVersion:"go1.18.2"}
kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.19) exceeds the supported minor version skew of +/-1

NiFi chart version: 1.1.1

What happened: I tested the new cert-manager configuration but the nifi pods are failing to start as the container cert-manager is failing constantly complaining about an input not an x.509 certificate cert-manager error.

What you expected to happen: The cert-manager to provide a valid ca certificate and the cluster to run normally.

How to reproduce it (as minimally and precisely as possible):

I basically copied the same values as the ones reported in https://github.com/cetic/helm-nifi/issues/224#issuecomment-1023663727.

I also modified the statefulset.yaml file as shown here in line 199.

--            --value "Initial User Identity {{ . }}" \
++            --value "Initial User Identity {{ add 2 . }}" \

Anything else we need to know:

If I ssh into the server pod and check the ca.crt then just ??e is shown.

nifi@nifi-cluster-0:/opt/nifi/nifi-current/tls/cert-manager$ cat ca.crt 
??e

Here are some information that help troubleshooting:

Check if a pod is in error:

kubectl get pod
NAME                           READY   STATUS             RESTARTS   AGE
nifi-cluster-0                 3/5     CrashLoopBackOff   9          8m43s
nifi-cluster-1                 3/5     CrashLoopBackOff   9          11m
nifi-cluster-2                 3/5     CrashLoopBackOff   10         11m
nifi-cluster-nifi-registry-0   1/1     Running            0          7d
nifi-cluster-zookeeper-0       1/1     Running            0          11m
nifi-cluster-zookeeper-1       1/1     Running            0          11m
nifi-cluster-zookeeper-2       1/1     Running            0          11m

Inspect the pod, check the "Events" section at the end for anything suspicious.

kubectl describe pod myrelease-nifi-0
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  12m                   default-scheduler  Successfully assigned nifi-cluster/nifi-cluster-1 to worker3.x.k8s-test.x.io
  Normal   Pulled     12m                   kubelet            Container image "busybox:1.32.0" already present on machine
  Normal   Created    12m                   kubelet            Created container zookeeper
  Normal   Started    12m                   kubelet            Started container zookeeper
  Normal   Pulling    12m                   kubelet            Pulling image "apache/nifi:1.16.3"
  Normal   Pulled     11m                   kubelet            Successfully pulled image "apache/nifi:1.16.3" in 39.432102597s
  Normal   Pulled     11m                   kubelet            Container image "busybox:1.32.0" already present on machine
  Normal   Created    11m                   kubelet            Created container user-log
  Normal   Pulled     11m                   kubelet            Container image "busybox:1.32.0" already present on machine
  Normal   Created    11m                   kubelet            Created container app-log
  Normal   Started    11m                   kubelet            Started container app-log
  Normal   Started    11m                   kubelet            Started container user-log
  Normal   Created    11m                   kubelet            Created container bootstrap-log
  Normal   Started    11m                   kubelet            Started container bootstrap-log
  Normal   Pulled     11m                   kubelet            Container image "busybox:1.32.0" already present on machine
  Normal   Started    11m (x2 over 11m)     kubelet            Started container server
  Normal   Created    11m (x2 over 11m)     kubelet            Created container server
  Normal   Pulled     11m                   kubelet            Container image "apache/nifi:1.16.3" already present on machine
  Normal   Pulled     11m (x2 over 11m)     kubelet            Container image "apache/nifi:1.16.3" already present on machine
  Normal   Created    11m (x2 over 11m)     kubelet            Created container cert-manager
  Normal   Started    11m (x2 over 11m)     kubelet            Started container cert-manager
  Warning  BackOff    2m34s (x41 over 11m)  kubelet            Back-off restarting failed container

Get logs on a failed container inside the pod (here the server one):

kubectl logs nifi-cluster-1 server

Java home: /usr/local/openjdk-8
NiFi home: /opt/nifi/nifi-current

Bootstrap Config File: /opt/nifi/nifi-current/conf/bootstrap.conf

Login Identity Providers Processed [/opt/nifi/nifi-current/./conf/login-identity-providers.xml]

updating nifi.remote.input.host in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.cluster.node.address in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.zookeeper.connect.string in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.web.http.host in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.web.proxy.host in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.keystore in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.keystoreType in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.keystorePasswd in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.truststore in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.truststoreType in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.truststorePasswd in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.web.https.host in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.cluster.node.address in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.web.https.network.interface.default in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.web.https.network.interface.lo in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.web.http.host in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.web.http.port in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.autoreload.enabled in /opt/nifi/nifi-current/conf/nifi.properties
updating nifi.security.autoreload.interval in /opt/nifi/nifi-current/conf/nifi.properties
/opt/nifi/nifi-current/tls/truststore.jks is not readable! Waiting for cert-manager sidecar to populate it.
/opt/nifi/nifi-current/tls/truststore.jks is not readable! Waiting for cert-manager sidecar to populate it.
/opt/nifi/nifi-current/tls/truststore.jks is not readable! Waiting for cert-manager sidecar to populate it.
/opt/nifi/nifi-current/tls/truststore.jks is not readable! Waiting for cert-manager sidecar to populate it.
/opt/nifi/nifi-current/tls/truststore.jks is not readable! Waiting for cert-manager sidecar to populate it. (infinite loop)
kubectl logs nifi-cluster-1 cert-manager
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   257  100   257    0     0   2519      0 --:--:-- --:--:-- --:--:-- 23363
keytool error: java.lang.Exception: Input not an X.509 certificate
wknickless commented 2 years ago

@lfreinag X.509 Common Names have a hard length limit of 64 characters. (See https://docs.digicert.com/manage-certificates/public-certificates-data-entries-that/). It looks like you're using a Helm installation name of nifi-cluster, which means the Common Name would work out to something like nifi-cluster-0.nifi-cluster-headless.default.svc.cluster.local, which is 63 characters. Any chance you're using a namespace other than default or a cluster DNS name other than cluster.local, which is causing the Common Name to exceed that 64-character limit?

If that's not the problem, what do these commands give you?

wknickless commented 2 years ago

@lfreinag You can also confirm the certificates were generated correctly by cert manager by (e.g.):

lfreinag commented 2 years ago

Hello! Thanks for the quick answer. Here I paste my outputs. I am thinking that my cert-manager version might be the one responsible for this problem 🤔 I am still using v1alpha2 😬 What do you think? Could this be a problem? I don't get the secrets but I will look into that and update you here 😉

kubectl get secret nifi-cluster-ca -o json | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -text -noout
Error from server (NotFound): secrets "nifi-cluster-ca" not found
unable to load certificate
4333520428:error:09FFF06C:PEM routines:CRYPTO_internal:no start line:/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/libressl/libressl-2.8/crypto/pem/pem_lib.c:684:Expecting: TRUSTED CERTIFICATE
kubectl get secret nifi-cluster-0 -o json | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -text -noout
Error from server (NotFound): secrets "nifi-cluster-0" not found
unable to load certificate
4298147372:error:09FFF06C:PEM routines:CRYPTO_internal:no start line:/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/libressl/libressl-2.8/crypto/pem/pem_lib.c:684:Expecting: TRUSTED CERTIFICATE
kubectl describe cert nifi-cluster-ca
Name:         nifi-cluster-ca
Namespace:    nifi-cluster
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: nifi-cluster
              meta.helm.sh/release-namespace: nifi-cluster
API Version:  cert-manager.io/v1alpha2
Kind:         Certificate
Metadata:
  Creation Timestamp:  2022-06-28T12:02:57Z
  Generation:          3
  Managed Fields:
    API Version:  cert-manager.io/v1alpha2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
        f:labels:
          .:
          f:app.kubernetes.io/managed-by:
      f:spec:
        .:
        f:commonName:
        f:duration:
        f:isCA:
        f:issuerRef:
          .:
          f:group:
          f:kind:
          f:name:
        f:privateKey:
          .:
          f:algorithm:
          f:rotationPolicy:
          f:size:
        f:renewBefore:
        f:secretName:
        f:subject:
          .:
          f:organizationalUnits:
    Manager:         helm
    Operation:       Update
    Time:            2022-06-28T12:02:57Z
  Resource Version:  228583528
  Self Link:         /apis/cert-manager.io/v1alpha2/namespaces/nifi-cluster/certificates/nifi-cluster-ca
  UID:               84f7a15e-be28-491c-96ed-5f9bfbaf41ef
Spec:
  Common Name:  nifi-cluster-ca.nifi-cluster.svc.cluster.local
  Duration:     87660h
  Is CA:        true
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   Issuer
    Name:   nifi-cluster
  Private Key:
    Algorithm:        RSA
    Rotation Policy:  Always
    Size:             2048
  Renew Before:       5m1s
  Secret Name:        nifi-cluster-ca
  Subject:
    Organizational Units:
      NIFI
Events:  <none>
kubectl describe cert nifi-cluster-0 
Name:         nifi-cluster-0
Namespace:    nifi-cluster
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: nifi-cluster
              meta.helm.sh/release-namespace: nifi-cluster
API Version:  cert-manager.io/v1alpha2
Kind:         Certificate
Metadata:
  Creation Timestamp:  2022-06-28T12:02:57Z
  Generation:          5
  Managed Fields:
    API Version:  cert-manager.io/v1alpha2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
        f:labels:
          .:
          f:app.kubernetes.io/managed-by:
      f:spec:
        .:
        f:commonName:
        f:dnsNames:
        f:duration:
        f:issuerRef:
          .:
          f:kind:
          f:name:
        f:privateKey:
          .:
          f:rotationPolicy:
        f:secretName:
        f:subject:
          .:
          f:organizationalUnits:
        f:usages:
    Manager:         helm
    Operation:       Update
    Time:            2022-06-28T12:02:57Z
  Resource Version:  228610000
  Self Link:         /apis/cert-manager.io/v1alpha2/namespaces/nifi-cluster/certificates/nifi-cluster-0
  UID:               b2e7d0aa-a8a9-40d7-8417-b07f6f7dc2cf
Spec:
  Common Name:  nifi-cluster-0.nifi-cluster-headless.nifi-cluster.svc.cluster.local
  Dns Names:
    nifi.x.k8s-test.x.io
    nifi-cluster.nifi-cluster.svc
    nifi-cluster.nifi-cluster.svc.cluster.local
    nifi-cluster-0.nifi-cluster-headless.nifi-cluster.svc
    nifi-cluster-0.nifi-cluster-headless.nifi-cluster.svc.cluster.local
  Duration:  2160h
  Issuer Ref:
    Kind:  Issuer
    Name:  nifi-cluster-ca
  Private Key:
    Rotation Policy:  Always
  Secret Name:        nifi-cluster-0
  Subject:
    Organizational Units:
      NIFI
  Usages:
    digital signature
    content commitment
    key encipherment
    data encipherment
    key agreement
    server auth
    client auth
Events:  <none>
wknickless commented 2 years ago

@lfreinag looks like your Common Name nifi-cluster-0.nifi-cluster-headless.nifi-cluster.svc.cluster.local is 67 characters long, which is beyond the limit of 64 characters. That's at least one of the reasons why cert-manager isn't issuing a certificate, and therefore isn't populating the secret. You may need to choose shorter Helm deployment and/or namespace names to get those Common Names down to 64 characters.

This chart targets the cert-manager.io/v1 API, which first became available in cert-manager version 1.0 (see https://cert-manager.io/docs/release-notes/release-notes-1.0). Also see https://github.com/cetic/helm-nifi/blob/master/tests/05-install-cert-manager.bash for how the chart regression tests install the latest version of cert-manager.

lfreinag commented 2 years ago

So I have corrected the Common Name problem. You addressed that twice but I missed the first one. Sorry for that. Anyhow that did not solve my problem for now so I will be upgrading cert-manager. Thanks for the script but I am running cert-manager as a sidecar. I will update you once I manage to upgrade that 😉 Thanks for the help!

lfreinag commented 2 years ago

So I now installed from scratch v1.0.0 for cert-manager and it is working like a charm! 🎉 Thanks for your support here. Great to see that multi cluster node setup is working too.

I am closing the issue. Have a nice weekend!