cetic / helm-nifi

Helm Chart for Apache Nifi
Apache License 2.0
211 stars 221 forks source link

[cetic/nifi] multi-node Cluster nifi web ui shows up intermittently (BadJOSEException: Signed JWT rejected) #271

Open eunjeh opened 1 year ago

eunjeh commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

The result of "kubectl logs [pod name] user-log"

RROR [NiFi Web Server-377] o.a.nifi.web.api.config.ThrowableMapper An unexpected error has occurred: org.springframework.security.oauth2.server.resource.InvalidBearerTokenException: An error occurred while attempting to decode the Jwt: Signed JWT rejected: Another algorithm expected, or no matching key(s) found. Returning Internal Server Error response. org.springframework.security.oauth2.server.resource.InvalidBearerTokenException: An error occurred while attempting to decode the Jwt: Signed JWT rejected: Another algorithm expected, or no matching key(s) found

Caused by: org.springframework.security.oauth2.jwt.BadJwtException: An error occurred while attempting to decode the Jwt: Signed JWT rejected: Another algorithm expected, or no matching key(s) found

Caused by: com.nimbusds.jose.proc.BadJOSEException: Signed JWT rejected: Another algorithm expected, or no matching key(s) found

Version of Helm, Kubernetes and the Nifi chart:

Helm - Version:"v3.9.3" Kubernetes - Major:"1", Minor:"24", GitVersion:"v1.24.2" Nifi chart - 1.1.1

What happened:

When I approach nifi web ui, sometimes I got normal page like this.

image

But, after few minutes, disconnection occurs. image

This keeps repeating.

What you expected to happen: Can approach nifi web ui normally and use without disconnection interruptions .

How to reproduce it (as minimally and precisely as possible):

  1. make cluster with 4 replicas
  2. cert-manager: enabled: true in values.yaml file
  3. install cert-manager with "helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.6.1 --set installCRDs=true"
  4. install CDR with (cert-manager.crds.yaml)
  5. helm install [release name] -n [namespace] cetic/nifi -f values.yaml

Anything else we need to know:

  1. There's no errors in the app-log, cert-manager, server for every pods.
  2. Pods are running fine.
  3. cert-manager pods are running fine.

Here are some information that help troubleshooting:

Check if a pod is in error:

kubectl get pod
NAME                  READY   STATUS    RESTARTS   AGE
myrelease-nifi-0             3/4     Failed   1          56m
myrelease-nifi-registry-0    1/1     Running   0          56m
myrelease-nifi-zookeeper-0   1/1     Running   0          56m
myrelease-nifi-zookeeper-1   1/1     Running   0          56m
myrelease-nifi-zookeeper-2   1/1     Running   0          56m
image

Inspect the pod, check the "Events" section at the end for anything suspicious.

kubectl describe pod myrelease-nifi-0

Events:

Get logs on a failed container inside the pod (here the server one):

kubectl logs myrelease-nifi-0 server

Bootstrap Config File: /opt/nifi/nifi-current/conf/bootstrap.conf

2022-09-21 05:23:03,553 INFO [main] org.apache.nifi.bootstrap.Command Starting Apache NiFi... 2022-09-21 05:23:03,554 INFO [main] org.apache.nifi.bootstrap.Command Working Directory: /opt/nifi/nifi-current 2022-09-21 05:23:03,554 INFO [main] org.apache.nifi.bootstrap.Command Command: /usr/local/openjdk-8/bin/java -classpath /opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.36.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-5.2.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.36.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.36.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.11.jar:/opt/nifi/nifi-current/./lib/logback-core-1.2.11.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-property-utils-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-server-api-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-stateless-api-1.16.3.jar:/opt/nifi/nifi-current/./lib/nifi-stateless-bootstrap-1.16.3.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.36.jar -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx2g -Xms2g -Djava.security.egd=file:/dev/urandom -Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol -Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties -Dnifi.bootstrap.listen.port=35063 -Dapp=NiFi -Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/nifi-current/logs org.apache.nifi.NiFi 2022-09-21 05:23:03,578 INFO [main] org.apache.nifi.bootstrap.Command Launched Apache NiFi with Process ID 11

eunjeh commented 1 year ago

Additional log from cert-manager-webhook

kubectl logs -n cert-manager pod/cert-manager-webhook-7d4b5b8c56-brthk

image
eunjeh commented 1 year ago

Can some one provide actual "values.yaml" file for the multi-node cluster? or at least which properties to make a change for.

ggerla commented 1 year ago

I've been battling with the same problem for several months. The cause lies in the cluster deployment, with a single node the problem does not exist. I also verified that with Firefox everything works fine, while with chrome the problem is systematic. I've read from many places that the solution is to configure nginx ingress with sticky sessions (also described in the official NiFi guide), but in my case I'm not using nginx but istio and I can't find an equivalent solution.

ilgi4130 commented 1 year ago

Hello, I'm facing the same problem, in my case i can't use nginx ingress but traefik. Personnaly i never can connect ( i got the error every time). With a portforward access th connection is fine. Is there a workaround to do this with trafik ?

cf250024 commented 1 year ago

I had the same issue with GKE until I added the sessionAffinity for the BackendConfig as follows:

---
# Backend config for GKE internal LB to perform healthcheck on container port (e.g., HTTPS on 8443)
# More details please refer to: https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#health_checks
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: nifi-https-ingress-ilb-backendconfig
  namespace: nifi
spec:
  healthCheck:
    checkIntervalSec: 5
    unhealthyThreshold: 2
    healthyThreshold: 2
    timeoutSec: 3
    type: HTTPS
    port: 8443
    requestPath: /
  sessionAffinity:
    affinityType: "CLIENT_IP"
banzo commented 1 year ago

@cf250024 thank you very much, before I close the issue I'd like to update the doc, woud you mind doing a PR with a new FAQ item? https://github.com/cetic/helm-nifi/blob/master/doc/FAQ.md

ThanosKarousos commented 11 months ago

I am experiencing a similar issue when trying to configure an ingress path that is not /, however the above configuration doesn't help. Error message from the user-log container is the same. I have tried setting up sessionAffinity as shown in my values.yaml, but with no luck.

What am I missing? Any help would be greatly appreciated.

cf250024 commented 11 months ago

@ThanosKarousos I’m assuming you’re using GKE. Why the request path isn’t “/“ ?

ThanosKarousos commented 11 months ago

@cf250024 this is on AKS. The request path is not "/", since I want to configure a different path (in this case, I want my cluster to be accessible under my.host.com/mynifi)

cf250024 commented 11 months ago

I believe you can have your domain name but not have path “mynifi” for this helm chart or for NiFi application itself.

ThanosKarousos commented 11 months ago

Based on this issue, it should be possible to change the path to "/mynifi". However, when trying the suggested configuration, I end up with the error message mentioned in the current issue.

This is the case only when trying to have a cluster with 2 or more nodes. When there is only 1 node, I am able to connect properly via ingress.