Orange-OpenSource / nifikop

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
https://orange-opensource.github.io/nifikop/
Apache License 2.0
129 stars 34 forks source link

Cluster scaling issue #129

Open riccardo-salamanna opened 3 years ago

riccardo-salamanna commented 3 years ago

Bug Report

What did you do? Trying to scale up and scale down a cluster What did you expect to see? The cluster scaling up and caling down when i add nodes What did you see instead? Under which circumstances? The scaling down does not happen, only the scaling up (and also every other configuration change does trigger a refresh). The log of the operator are also filled with errors and it's CPU usage does spike and stay high.

Environment

Possible Solution I sincerely do not know.

Additional context here's the output log for the operator pod

2021-09-02T17:21:20.541Z    ERROR   nifi_client Error during preparing the request  {"error": "The target node id doesn't exist in the cluster", "errorVerbose": "The target node id doesn't exist in the cluster\ngithub.com/Orange-OpenSource/nifikop/pkg/nificlient.init\n\t/workspace/pkg/nificlient/common.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5652\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:191\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"}
github.com/go-logr/zapr.(*zapLogger).Error
    /go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
github.com/Orange-OpenSource/nifikop/pkg/nificlient.(*nifiClient).GetClusterNode
    /workspace/pkg/nificlient/system.go:49
github.com/Orange-OpenSource/nifikop/pkg/clientwrappers/scale.CheckIfNCActionStepFinished
    /workspace/pkg/clientwrappers/scale/scale.go:166
github.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).checkNCActionStep
    /workspace/controllers/nificlustertask_controller.go:324
github.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).handlePodRunningTask
    /workspace/controllers/nificlustertask_controller.go:251
github.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).Reconcile
    /workspace/controllers/nificlustertask_controller.go:89
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99
riccardo-salamanna commented 3 years ago

actually even after a simple deployment, of a simple cluster example, the cpu does spike and the logs start filling with these issues in a constant loop:

2021-09-02T17:34:38.407Z    INFO    controllers.NifiClusterTask Nifi cluster task is still running  {"actionStep": "CONNECTED"}
2021-09-02T17:34:38.407Z    INFO    controllers.NifiClusterTask nc action step: CONNECTED: Nifi cluster task is still running
2021-09-02T17:34:38.407Z    ERROR   controller-runtime.manager.controller.nificluster   Reconciler error    {"reconciler group": "nifi.orange.com", "reconciler kind": "NifiCluster", "name": "nifikop-dev", "namespace": "nifi", "error": "nc action step: CONNECTED: Nifi cluster task is still running", "errorVerbose": "Nifi cluster task is still running\ngithub.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).checkNCActionStep\n\t/workspace/controllers/nificlustertask_controller.go:389\ngithub.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).handlePodRunningTask\n\t/workspace/controllers/nificlustertask_controller.go:251\ngithub.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).Reconcile\n\t/workspace/controllers/nificlustertask_controller.go:89\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:198\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nnc action step: CONNECTED"}
github.com/go-logr/zapr.(*zapLogger).Error
    /go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:267
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99
bruckwubete commented 3 years ago

i am also seeing this on EKS deployment.

2021-10-18T04:58:10.955Z    DEBUG   controllers.NifiCluster resource is in sync {"component": "nifi", "clusterName": "simplenifi", "clusterNamespace": "nifi", "kind": "*v1.Pod"}
2021-10-18T04:58:10.956Z    ERROR   nifi_client Error during preparing the request  {"error": "The target node id doesn't exist in the cluster", "errorVerbose": "The target node id doesn't exist in the cluster\ngithub.com/Orange-OpenSource/nifikop/pkg/nificlient.init\n\t/workspace/pkg/nificlient/common.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5652\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:5647\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:191\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"}
github.com/go-logr/zapr.(*zapLogger).Error
    /go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
github.com/Orange-OpenSource/nifikop/pkg/nificlient.(*nifiClient).GetClusterNode
    /workspace/pkg/nificlient/system.go:65
github.com/Orange-OpenSource/nifikop/pkg/clientwrappers/scale.CheckIfNCActionStepFinished
    /workspace/pkg/clientwrappers/scale/scale.go:166
github.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).checkNCActionStep
    /workspace/controllers/nificlustertask_controller.go:367
github.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).handlePodRunningTask
    /workspace/controllers/nificlustertask_controller.go:280
github.com/Orange-OpenSource/nifikop/controllers.(*NifiClusterTaskReconciler).Reconcile
    /workspace/controllers/nificlustertask_controller.go:91
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.2/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99
2021-10-18T04:58:10.956Z    INFO    controllers.NifiClusterTask Nifi cluster task is still running  {"actionStep": "CONNECTED"}
2021-10-18T04:58:10.956Z    INFO    controllers.NifiClusterTask nc action step: CONNECTED: Nifi cluster task is still running