Closed aluneau closed 4 years ago
same issue for me also,i have increased node to 6 and scale down not working all of my pvc still available in my cluster
I tried to modify the preStop hook without success. I don't know how to deal with this problem.
Do you have an idea to investigate ?
No idea but i need to look on this to remove
@bloudman Did you get any solution for this issue
@bloudman are your nifi cluster nodes setup with tls? im having a problem scaling past 1 node. soon as the 2nd node is running i get errors like this [apache-nifi-1 app-log] 2019-12-04 12:49:53,846 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'CONNECTION_REQUEST' protocol message due to: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors.
my mutli node cluster with tls enabled is now working correctly. I do have a related issue to the OP when we patch our kubernetes cluster, during the patching one of the nifi nodes it shutdown which then causes the flow to be locked until that nifi node is started back up and reconnected to the cluster.
Any solution for this bug
Hi, to perform a clean scale down, I suggest you the following steps :
$ cat << EOF | kubectl apply -n default -f -
kind: Pod
apiVersion: v1
metadata:
name: marks-dummy-pod
spec:
containers:
- name: marks-dummy-pod
image: ubuntu
command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
restartPolicy: Never
EOF
$ kubectl -n default exec -it marks-dummy-pod -- /bin/bash
(pod)$ apt update
(pod)$ git clone https://github.com/erdrix/nifi-api-client-python.git
(pod)$ cd nifi-api-client-python
$ kubectl get service -n <Nifi namespace>
(pod)$ python nifi-client-python.py --url http://<nifi-service-external-ip>:<nifi-service-port>/nifi-api --action cluster
Note : The node name should have the following pattern <pod_name>.nifi-headless.<namespace_name>.svc.cluster.local
.
(pod)$ python nifi-client-python.py --url http://<nifi-service-external-ip>:<nifi-service-port>//nifi-api --action decommission --node <node_name> --nodePort <pod_port | by default : 8080>
(pod)$ python nifi-client-python.py --url http://<nifi-service-external-ip>:<nifi-service-port>//nifi-api --action remove --node <node_name>
$ kubectl scale -n <nifi_namespace> sts <nifi_statefulset_name> --replicas=<current_replica_minus_1>
The first call will disconnect the Nifi Node from the cluster, stop all Inputs processors and wait all queues will be drained. The second one, will remove node from nifi cluster config (which is into zookeeper). And the last command perform the scale down.
Note : This should be performed by an Operator (cloudera have one in closed beta).
@alexnuttinck is there a reason the commit with the fix for this was not yet merged into master?
Asking cause I need a solution for this issue and need to know if that is the way to go.
edit: I figured it out and post a new comment with my idea for a solution
He is currently on personal leave, but will be back in the office very soon.
I tried to come up with a clear decomissioning of a node upon shutdown, as proposed in the nifi doc here (https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#decommission-nodes).
To get a better understanding why my solution looks like it does here a short list of the way towards it:
I created pull request #57 to start a discussion about the solution, any suggestions that make it better and more robust are welcome.
Hello @stoetti,
@alexnuttinck is there a reason the commit with the fix for this was not yet merged into master?
I think I still had some issues to correct before merging.
I tried to come up with a clear decomissioning of a node upon shutdown, as proposed in the nifi doc here (https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#decommission-nodes).
Good, it's the documentation I followed too.
as soon as nifi is shutdown, the main process of the "server"-container is stopped and therefor a preStop-hook for the server-container itself is not working
Yes, it's the issue I had, thanks for the report!
the output of the lifecycle-hooks is not available through k8s-api so I redirected the output to a temporary-file and "tail" that file
It seems to be an option.
for easier development I extended the main template to support "extraContainers" declared via values-file
Interesting, but I think this extraContainer should be directly in the Statefulset definition and not the values.yaml, but proposing extraContainers in the values.yaml is a good idea.
the terminationGracePeriod of the pod is configurable via values
Great.
I created pull request #57 to start a discussion about the solution, any suggestions that make it better and more robust are welcome.
Ok, I will have a look asap. Thanks for your investigations @stoetti !
Thanks to @stoetti, this bug is now fixed. Please, reopen this issue if you still have a problem.
Fix is merged in v0.4.2 of the Helm Chart.
The fix still has an issue when the pod nifi-0 needs to restart the decomissioning of that node does not work properly.
Ok @stoetti thanks for the report, I propose to repoen this issue to keep that in mind.
Hi,
I'm in trouble at scaling up/scaling down my apache nifi instance on kubernetes. In Kubernetes everythings look fine but in apache nifi, scale up looks fine but scaling down create a shadow node that I need to destroy in the nifi interface. Do you know a way to do it properly ?
Thanks by advance !