Orange-OpenSource / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://orange-opensource.github.io/nifikop/
Apache License 2.0
128 stars 34 forks source link

secured nifi cluster : Failed to connect to headless svc host Connection refused #143

Open omkadmi opened 2 years ago

omkadmi commented 2 years ago

Bug Report

What did you do?

I deployed an unsecured nifi cluster -> it works I deployed a secure nifi cluster with a self-signed certificate (managed by nifikop) -> it works I have deployed a secure nifi cluster with the cert manager + let'sencrypt -> does not work

I followed all the steps in the documentation https://orange-opensource.github.io/nifikop/blog/2020/06/30/secured_nifi_cluster_on_gcp_with_external_dns, but I still have this connection refused error knowing that the certificates are issued by the cert manager I also see the sslnifi entries in Azure private DNS created by extarnalDNS

for info, nifikop, zookeeper and the nifi cluster are in the nifi namspace, cert-manager, letsencrypt and externaldns are in the devops namespace

I have this error in the pod log (which repeats ad infinitum):

Waiting for host to be reachable
failed to reach sslnifi-0-node.sslnifi-headless.mycompany.net:8443
Found: , expecting: 10.66.161.197
Found :
failed to reach sslnifi-0-node.sslnifi-headless.mycompany.net:8443
Found: , expecting: 10.66.161.197
Found :
failed to reach sslnifi-0-node.sslnifi-headless.mycompany.net:8443
Found: , expecting: 10.66.161.197
Found :
failed to reach sslnifi-0-node.sslnifi-headless.mycompany.net:8443
Found: , expecting: 10.66.161.197
Found :
failed to reach sslnifi-0-node.sslnifi-headless.mycompany.net:8443
Found: , expecting: 10.66.161.197
Found :
failed to reach sslnifi-0-node.sslnifi-headless.mycompany.net:8443
Found: , expecting: 10.66.161.197

I have this error in the describe : Failed to connect to sslnifi-0-node.sslnifi-headless.nifi.svc.cluster.local port 8443

Readiness probe failed: * Expire in 0 ms for 6 (transfer 0x557d85ecef50) * Expire in 1 ms for 1 (transfer 0x557d85ecef50) % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Expire in 0 ms for 1 (transfer 0x557d85ecef50) * Expire in 1 ms for 1 (transfer 0x557d85ecef50) * Expire in 0 ms for 1 (transfer 0x557d85ecef50) * Expire in 0 ms for 1 (transfer 0x557d85ecef50) * Expire in 0 ms for 1 (transfer 0x557d85ecef50) * Trying 10.66.161.197... * TCP_NODELAY set * Expire in 200 ms for 4 (transfer 0x557d85ecef50) * connect to 10.66.161.197 port 8443 failed: Connection refused * Failed to connect to sslnifi-0-node.sslnifi-headless.nifi.svc.cluster.local port 8443: Connection refused * Closing connection 0 curl: (7) Failed to connect to sslnifi-0-node.sslnifi-headless.nifi.svc.cluster.local port 8443: Connection refused

I don't understand why it is looking in sslnifi-0-node.sslnifi-headless.nifi.svc.cluster.local (which ends in .cluster.local) when it should (I guess) look in .mycompany.net

below the cert-manager log

31
I1018 09:49:32.621252       1 conditions.go:173] Setting lastTransitionTime for Certificate "sslnifi-controller.nifi.mgt.mycompany.net" condition "Issuing" to 2021-10-18 09:49:32.621243958 +0000 UTC m=+1231705.057373825
30
I1018 09:49:32.622054       1 conditions.go:173] Setting lastTransitionTime for Certificate "sslnifi-controller.nifi.mgt.mycompany.net" condition "Ready" to 2021-10-18 09:49:32.622049068 +0000 UTC m=+1231705.058179035
29
I1018 09:49:32.779307       1 conditions.go:173] Setting lastTransitionTime for Certificate "sslnifi-0-node.sslnifi-headless.mycompany.net" condition "Issuing" to 2021-10-18 09:49:32.77929993 +0000 UTC m=+1231705.215429897
28
I1018 09:49:32.781088       1 conditions.go:173] Setting lastTransitionTime for Certificate "sslnifi-0-node.sslnifi-headless.mycompany.net" condition "Ready" to 2021-10-18 09:49:32.781081754 +0000 UTC m=+1231705.217211621
27
E1018 09:49:32.829395       1 controller.go:158] cert-manager/controller/CertificateTrigger "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompany.net"
26
I1018 09:49:32.829670       1 conditions.go:173] Setting lastTransitionTime for Certificate "sslnifi-controller.nifi.mgt.mycompany.net" condition "Issuing" to 2021-10-18 09:49:32.829665191 +0000 UTC m=+1231705.265795058
25
E1018 09:49:32.937119       1 controller.go:158] cert-manager/controller/CertificateTrigger "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-0-node.sslnifi-headless.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-0-node.sslnifi-headless.mycompany.net"
24
I1018 09:49:32.937207       1 conditions.go:173] Setting lastTransitionTime for Certificate "sslnifi-0-node.sslnifi-headless.mycompany.net" condition "Issuing" to 2021-10-18 09:49:32.937203301 +0000 UTC m=+1231705.373333268
23
E1018 09:49:32.964735       1 controller.go:158] cert-manager/controller/CertificateTrigger "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompany.net"
22
I1018 09:49:32.964930       1 conditions.go:173] Setting lastTransitionTime for Certificate "sslnifi-controller.nifi.mgt.mycompany.net" condition "Issuing" to 2021-10-18 09:49:32.964925064 +0000 UTC m=+1231705.401054931
21
E1018 09:49:33.528303       1 controller.go:158] cert-manager/controller/CertificateKeyManager "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompany.net"
20
E1018 09:49:33.913078       1 controller.go:158] cert-manager/controller/CertificateKeyManager "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-0-node.sslnifi-headless.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-0-node.sslnifi-headless.mycompany.net"
19
I1018 09:49:34.942552       1 conditions.go:233] Setting lastTransitionTime for CertificateRequest "sslnifi-controller.nifi.mgt.mycompany.net-bhx8b" condition "Ready" to 2021-10-18 09:49:34.942545097 +0000 UTC m=+1231707.378674964
18
I1018 09:49:35.375578       1 conditions.go:233] Setting lastTransitionTime for CertificateRequest "sslnifi-0-node.sslnifi-headless.mycompany.net-fdrzw" condition "Ready" to 2021-10-18 09:49:35.375571575 +0000 UTC m=+1231707.811701442
17
I1018 09:49:36.157507       1 conditions.go:233] Setting lastTransitionTime for CertificateRequest "sslnifi-0-node.sslnifi-headless.mycompany.net-fdrzw" condition "Ready" to 2021-10-18 09:49:36.157499429 +0000 UTC m=+1231708.593629296
16
E1018 09:49:36.712804       1 controller.go:158] cert-manager/controller/certificaterequests-issuer-acme "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificaterequests.cert-manager.io \"sslnifi-0-node.sslnifi-headless.mycompany.net-fdrzw\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-0-node.sslnifi-headless.mycompany.net-fdrzw"
15
E1018 09:49:37.917592       1 controller.go:158] cert-manager/controller/certificaterequests-issuer-acme "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificaterequests.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net-bhx8b\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompany.net-bhx8b"
14
I1018 09:49:38.878739       1 acme.go:184] cert-manager/controller/certificaterequests-issuer-acme/sign "msg"="certificate issued" "related_resource_kind"="Order" "related_resource_name"="sslnifi-controller.nifi.mgt.mycompany.net-bhx8b-3833685911" "related_resource_namespace"="nifi" "related_resource_version"="v1" "resource_kind"="CertificateRequest" "resource_name"="sslnifi-controller.nifi.mgt.mycompany.net-bhx8b" "resource_namespace"="nifi" "resource_version"="v1"
13
I1018 09:49:38.878997       1 conditions.go:222] Found status change for CertificateRequest "sslnifi-controller.nifi.mgt.mycompany.net-bhx8b" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2021-10-18 09:49:38.878992016 +0000 UTC m=+1231711.315121983
12
E1018 09:49:39.372677       1 controller.go:158] cert-manager/controller/CertificateReadiness "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompany.net"
11
I1018 09:49:39.373408       1 conditions.go:162] Found status change for Certificate "sslnifi-controller.nifi.mgt.mycompany.net" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2021-10-18 09:49:39.373401899 +0000 UTC m=+1231711.809531866
10
E1018 09:49:39.593733       1 controller.go:158] cert-manager/controller/CertificateIssuing "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompany.net"
9
E1018 09:49:39.937234       1 controller.go:158] cert-manager/controller/CertificateReadiness "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompany.net"
8
I1018 09:49:39.937992       1 conditions.go:162] Found status change for Certificate "sslnifi-controller.nifi.mgt.mycompany.net" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2021-10-18 09:49:39.937985336 +0000 UTC m=+1231712.374115203
7
I1018 09:49:40.134578       1 acme.go:184] cert-manager/controller/certificaterequests-issuer-acme/sign "msg"="certificate issued" "related_resource_kind"="Order" "related_resource_name"="sslnifi-0-node.sslnifi-headless.mycompany.net-fdrzw-2332423181" "related_resource_namespace"="nifi" "related_resource_version"="v1" "resource_kind"="CertificateRequest" "resource_name"="sslnifi-0-node.sslnifi-headless.mycompany.net-fdrzw" "resource_namespace"="nifi" "resource_version"="v1"
6
I1018 09:49:40.135097       1 conditions.go:222] Found status change for CertificateRequest "sslnifi-0-node.sslnifi-headless.mycompany.net-fdrzw" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2021-10-18 09:49:40.135089423 +0000 UTC m=+1231712.571219390
5
E1018 09:49:40.136239       1 controller.go:158] cert-manager/controller/CertificateKeyManager "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-controller.nifi.mgt.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-controller.nifi.mgt.mycompanyv.net"
4
E1018 09:49:41.572069       1 controller.go:158] cert-manager/controller/CertificateReadiness "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-0-node.sslnifi-headless.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-0-node.sslnifi-headless.mycompany.net"
3
I1018 09:49:41.573131       1 conditions.go:162] Found status change for Certificate "sslnifi-0-node.sslnifi-headless.mycompany.net" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2021-10-18 09:49:41.573123467 +0000 UTC m=+1231714.009253334
2
E1018 09:49:42.044550       1 controller.go:158] cert-manager/controller/CertificateReadiness "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"sslnifi-0-node.sslnifi-headless.mycompany.net\": the object has been modified; please apply your changes to the latest version and try again" "key"="nifi/sslnifi-0-node.sslnifi-headless.mycompany.net"
1
I1018 09:49:42.045275       1 conditions.go:162] Found status change for Certificate "sslnifi-0-node.sslnifi-headless.mycompany.net" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2021-10-18 09:49:42.045269625 +0000 UTC m=+1231714.481399492

below the externaldns log

time="2021-10-13T09:35:52Z" level=info msg="Updating A record named 'sslnifi-int' to '10.66.161.134' for Azure Private DNS zone 'mycompany.net'."
time="2021-10-13T09:35:52Z" level=info msg="Updating A record named 'sslnifi-0-node.sslnifi-int' to '10.66.161.134' for Azure Private DNS zone 'mycompany.net'."
time="2021-10-13T09:35:53Z" level=info msg="Updating TXT record named 'sslnifi-int' to '\"heritage=external-dns,external-dns/owner=<server_name>,external-dns/resource=service/nifi/sslnifi-headless\"' for Azure Private DNS zone 'mycompany.net'."
time="2021-10-13T09:35:53Z" level=info msg="Updating TXT record named 'sslnifi-0-node.sslnifi-int' to '\"heritage=external-dns,external-dns/owner=<server_name>,external-dns/resource=service/nifi/sslnifi-headless\"' for Azure Private DNS zone 'mycompany.net'."
kubectl get all -n nifi
NAME                               READY   STATUS    RESTARTS   AGE
pod/nifikop-int-76cbbff7c6-8fz2g   1/1     Running   0          47h
pod/sslnifi-0-nodexbwjp            0/1     Running   0          4m41s
pod/zookeeper-0                    1/1     Running   0          5d1h

NAME                         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                       AGE
service/clusterip            ClusterIP   10.0.18.0     <none>        8443/TCP                      36m
service/sslnifi-headless     ClusterIP   None          <none>        8443/TCP,6007/TCP,10000/TCP   36m
service/zookeeper            ClusterIP   10.0.18.121   <none>        2181/TCP,2888/TCP,3888/TCP    5d1h
service/zookeeper-headless   ClusterIP   None          <none>        2181/TCP,2888/TCP,3888/TCP    5d1h

NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nifikop-int   1/1     1            1           47h

NAME                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/nifikop-int-76cbbff7c6   1         1         1       47h

NAME                         READY   AGE
statefulset.apps/zookeeper   1/1     5d1h

below my conf:

apiVersion: nifi.orange.com/v1alpha1
kind: NifiCluster
metadata:
  name: sslnifi
  namespace: nifi
spec:
  service:
    headlessEnabled: true
    annotations:
      external-dns.alpha.kubernetes.io/ttl: "60"
  zkAddress: "zookeeper:2181"
  zkPath: "/sslnifi"
  clusterImage: "apache/nifi:1.12.1"
  oneNifiNodePerNode: false
  managedAdminUsers:
    -  identity : "myname@gmail.com"
       name: "myname"
  managedReaderUsers:
    -  identity : "toton@orange.com"
       name: "toto"
  propagateLabels: true
  nifiClusterTaskSpec:
    retryDurationMinutes: 20
  readOnlyConfig:
    nifiProperties:
      webProxyHosts:
        - sslnifi-int.mycompany.net
      # Additionnals nifi.properties configuration that will override the one produced based
      # on template and configurations.
      overrideConfigs: |
        nifi.security.user.oidc.discovery.url=https://accounts.google.com/.well-known/openid-configuration
        nifi.security.user.oidc.client.id=xxxxxxxxxxxxxxxxxxx
        nifi.security.user.oidc.client.secret=xxxxxxxxxxxxxxxxxx
        nifi.security.identity.mapping.pattern.dn=CN=([^,]*)(?:, (?:O|OU)=.*)?
        nifi.security.identity.mapping.value.dn=$1
        nifi.security.identity.mapping.transform.dn=NONE
  nodeConfigGroups:
    default_group:
      isNode: true
      storageConfigs:
        - mountPath: "/opt/nifi/nifi-current/logs"
          name: logs
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "default"
            resources:
              requests:
                storage: 10Gi
        - mountPath: "/opt/nifi/data"
          name: data
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "default"
            resources:
              requests:
                storage: 10Gi
        - mountPath: "/opt/nifi/flowfile_repository"
          name: flowfile-repository
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "default"
            resources:
              requests:
                storage: 10Gi
        - mountPath: "/opt/nifi/nifi-current/conf"
          name: conf
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "default"
            resources:
              requests:
                storage: 10Gi
        - mountPath: "/opt/nifi/content_repository"
          name: content-repository
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "default"
            resources:
              requests:
                storage: 10Gi
        - mountPath: "/opt/nifi/provenance_repository"
          name: provenance-repository
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "default"
            resources:
              requests:
                storage: 10Gi
      serviceAccountName: "default"
      resourcesRequirements:
        limits:
          cpu: "2"
          memory: 3Gi
        requests:
          cpu: "1"
          memory: 1Gi
  nodes:
    - id: 0
      nodeConfigGroup: "default_group"
  listenersConfig:
    useExternalDNS: true
    clusterDomain: "mycompany.net"
    internalListeners:
      - type: "https"
        name: "https"
        containerPort: 8443
      - type: "cluster"
        name: "cluster"
        containerPort: 6007
      - type: "s2s"
        name: "s2s"
        containerPort: 10000
    sslSecrets:
      tlsSecretName: "sslnifi-int.mycompany.net-tls"
      create: true
      clusterScoped: true
      issuerRef:
        kind: ClusterIssuer
        name: letsencrypt-staging
  externalServices:
    - name: "clusterip"
      spec:
        type: ClusterIP
        portConfigs:
          - port: 8443
            internalListenerName: "https"
      serviceAnnotations:
        toto: tata

I deployed the nifikop with

helm repo add orange-incubator https://orange-kubernetes-charts-incubator.storage.googleapis.com/
helm install nifikop \
    orange-incubator/nifikop \
    --namespace=nifi \
    --version 0.7.0 \
    --set image.tag=v0.7.0-release \
    --set resources.requests.memory=256Mi \
    --set resources.requests.cpu=250m \
    --set resources.limits.memory=256Mi \
    --set resources.limits.cpu=250m \
    --set namespaces={"nifi"}

I thank you in advance for your help, I've been working on it for a few days, and I don't see any solutions

What did you expect to see? the pod of the nifi node must have the running stattus 1/1

What did you see instead? Under which circumstances? the nifi node pod is running 0/1

Environment

0.7.0 the same problem with 0.6.3

V1.19.11

1.12.1

wandersonpereira commented 2 years ago

I have same problem!

omkadmi commented 2 years ago

nobody has this problem ? it is however a major problem

wandersonpereira commented 2 years ago

Hello @omkadmi!

Did you resolve this problem?