cetic / helm-nifi

Helm Chart for Apache Nifi
Apache License 2.0
215 stars 228 forks source link

[cetic/nifi] CONNECTION_REQUEST marshal problems in 2 NiFi instances cluster #262

Open forry2 opened 2 years ago

forry2 commented 2 years ago

Hi, I tried my best to run a 2-instances cluster but the second NiFi instance does not join the cluster. app-log shows a HEARTBEAT marshal problem. Please find my values.yaml (renamed as values.txt) file attached to this issue.

Failed marshalling 'CONNECTION_REQUEST' protocol message due to: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchor values.txt

forry2 commented 2 years ago

First NiFi node correctly creates the cluster, but the second node keeps popping such messages:

2022-07-01 19:31:11,102 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:16,436 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:21,773 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:27,090 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:32,403 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:37,720 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:43,035 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:48,349 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:53,663 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:31:58,980 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:32:04,291 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 2022-07-01 19:32:09,607 WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message 202

wknickless commented 2 years ago

Hi @forry2 since NiFi 1.14.0 security is turned on by default (see https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.14.0) which means NiFi cluster nodes expect to mutually authenticate using TLS. The chart supports setting up the certificate authority and client certificates using either (A) the built-in NiFi Toolkit or (B) cert-manager (see https://cert-manager.io). Your values.yaml file has neither of those options enabled.

forry2 commented 2 years ago

@wknickless thank you so much for your reply. I turned ca.enabled to true, but no better luck, still plenty of "Failed marshalling 'HEARTBEAT' protocol message" messages there on the second node. What am I missing?

ca: '# If true, enable the nifi-toolkit certificate authority enabled: true persistence: enabled: true server: "" service: port: 9090 token: sixteenCharacters admin: cn: admin serviceAccount: create: false

name: nifi-ca

openshift: scc: enabled: false

wknickless commented 2 years ago

@forry2 unfortunately I don't know how that part of the chart works.

forry2 commented 2 years ago

Does anybody know how to make the ca part of the chart work? @Subv and @alexnuttinck I saw you worked on this part of the code

forry2 commented 2 years ago

Looks like either the 2nd instance is not presenting itself to the coordinator with the correct certificate name, or the 2nd instance's public key is not present in the truststore of the coordinator. How does the ca section of the chart work?

forry2 commented 2 years ago

Hi no clue anybody? :( the chart is not working as it is :(

forry2 commented 2 years ago

It came out that NiFi pod were actually generating certificates but not using them. We have patched the bash script that manages this part, but I'd like to get in touch with someone here who worked on that part to understand whether we didn't use the chart in the right way or if that's actually a bug

wknickless commented 2 years ago

@forry2 thanks for debugging the problem!

It looks like @alexnuttinck and @makeacode added CA support in https://github.com/cetic/helm-nifi/pull/76 back in September 2020. The chart (generally) and the Helm templating is so complicated that I wrote a bunch of tests (see https://github.com/cetic/helm-nifi/tree/master/tests and https://github.com/cetic/helm-nifi/tree/master/.github/workflows) to ensure that as additions and changes are made they don't break anything. Unfortunately the CA support pre-dates that strategy, so we don't have any test coverage.

Would you be willing to share the changes you made to get it to work?

Also, would you be up to adapting one or more of the existing tests to cover your use case? For example:

https://github.com/cetic/helm-nifi/blob/b1c476f6b389be7c6d1fb9f6046ffa1f2af5e79e/tests/07-oidc-cluster-login-test.js#L82-L103

...confirms that a 3-way NiFi cluster actually comes up with mutual connections and authentication.

Subv commented 2 years ago

It seems the cert-request initContainer was removed from the StatefulSet in #169 so the nodes all generate their own self-signed certificate and cannot form a cluster when using ca.enabled=true instead of CertManager

forry2 commented 2 years ago

Does anybody know how to go on with that part of the chart?

Il sab 2 lug 2022, 00:09 Bill Nickless @.***> ha scritto:

@forry2 https://github.com/forry2 unfortunately I don't know how that part of the chart works.

— Reply to this email directly, view it on GitHub https://github.com/cetic/helm-nifi/issues/262#issuecomment-1172750615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBFQ3RX6K6GD7OQQ3ON6CLVR5UB5ANCNFSM52NUMMZA . You are receiving this because you were mentioned.Message ID: @.***>