Orange-OpenSource / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://orange-opensource.github.io/nifikop/
Apache License 2.0
128 stars 34 forks source link

Reconcile Error - Nifi cluster communication error: could not connect to nifi nodes #112

Closed tty47 closed 3 years ago

tty47 commented 3 years ago

Bug Report

I have create and install the following resources, previous to deploy the nifi cluster:

kubectl create ns zookeeper
kubectl create ns nifi
kubectl create ns nifikop

helm install zookeeper bitnami/zookeeper \
    --set resources.requests.memory=256Mi \
    --set resources.requests.cpu=250m \
    --set resources.limits.memory=256Mi \
    --set resources.limits.cpu=250m \
    --set global.storageClass=standard \
    --set networkPolicy.enabled=true \
    --set replicaCount=3 \
    --namespace=zookeeper

### Install the CustomResourceDefinitions and cert-manager itself
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml

# create the resources from:
https://github.com/Orange-OpenSource/nifikop/tree/master/helm/nifikop/crds
kubectl apply -f nifi.orange.com_nificlusters.yaml
kubectl apply -f nifi.orange.com_nifidataflows.yaml
kubectl apply -f nifi.orange.com_nifiparametercontexts.yaml
kubectl apply -f nifi.orange.com_nifiregistryclients.yaml
kubectl apply -f nifi.orange.com_nifiusergroups.yaml
kubectl apply -f nifi.orange.com_nifiusers.yaml

I have setup the operator from the project, executing the following commands:

make build; make run

What did you do? I am trying to deploy the operator and create a basic Nifi cluster with the manifest:

apiVersion: nifi.orange.com/v1alpha1
kind: NifiCluster
metadata:
  name: simplenifi
  namespace: nifi
spec:
  service:
    headlessEnabled: true
  zkAddress: "zookeeper.zookeeper:2181"
  zkPath: "/simplenifi"
  clusterImage: "apache/nifi:1.12.1"
  oneNifiNodePerNode: false
  nodeConfigGroups:
    default_group:
      isNode: true
      storageConfigs:
        - mountPath: "/opt/nifi/nifi-current/logs"
          name: logs
          pvcSpec:
            accessModes:
              - ReadWriteOnce
            storageClassName: "standard"
            resources:
              requests:
                storage: 10Gi
      serviceAccountName: "default"
      resourcesRequirements:
        limits:
          cpu: "2"
          memory: 3Gi
        requests:
          cpu: "1"
          memory: 1Gi
  nodes:
    - id: 0
      nodeConfigGroup: "default_group"
    - id: 1
      nodeConfigGroup: "default_group"
    - id: 2
      nodeConfigGroup: "default_group"
  propagateLabels: true
  nifiClusterTaskSpec:
    retryDurationMinutes: 10
  listenersConfig:
    internalListeners:
      - type: "http"
        name: "http"
        containerPort: 8080
      - type: "cluster"
        name: "cluster"
        containerPort: 6007
      - type: "s2s"
        name: "s2s"
        containerPort: 10000

Once I deploy the previous manifest, I got the following error in the operator:

 ERROR   nifi_client     Error during talking to nifi node       {"error": "Get \"http://simplenifi-headless.nifi.svc.cluster.local:8080/nifi-api/controller/cluster\": dial tcp: lookup simplenifi-headless.nifi.svc.cluster.local: no such host"}
github.com/go-logr/zapr.(*zapLogger).Error

The pods and the services in the ns nifi seems ok:

kubectl get all -n nifi
# output
NAME                         READY   STATUS    RESTARTS   AGE
pod/simplenifi-0-nodezt7dz   1/1     Running   0          48m
pod/simplenifi-1-node5jgxz   1/1     Running   0          48m
pod/simplenifi-2-node9w2xm   1/1     Running   0          48m

NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                       AGE
service/simplenifi-headless   ClusterIP   None         <none>        8080/TCP,6007/TCP,10000/TCP   48m

What did you expect to see? Dont get errors from the operator and being accessible forwarding port to the Nifi service (I use 8082 due to the operator running in dev mode (make run), use the 8080 in my local.

kubectl port-forward service/simplenifi-headless 8082:8080 -n nifi

I got:

E0623 09:53:31.095165  419131 portforward.go:400] an error occurred forwarding 8082 -> 8080: error forwarding port 8080 to pod 927babdcc7ac70b423116a70ab2ae202b5c4bf9b79198ba29821086c55fde040, uid : failed to execute portforward in network namespace "/var/run/netns/cni-bee50d24-38ae-5b57-bb9b-19bd2cf00ca6": failed to connect to localhost:8080 inside namespace "927babdcc7ac70b423116a70ab2ae202b5c4bf9b79198ba29821086c55fde040", IPv4: dial tcp4 127.0.0.1:8080: connect: connection refused IPv6 dial tcp6 [::1]:8080: connect: connection refused 

What did you see instead? Under which circumstances? The error mentioned.

Environment

Thanks in advance

totosh commented 3 years ago

With make run, your local operator is starting from your system and not into your k8s cluster so it can't resolve the host simplenifi-headless.nifi.svc.cluster.local. Insert a line in your hosts file.

tty47 commented 3 years ago

Hello, Yes, you are right! It was fixed deploying into the cluster directly. Thanks for the tip ;)