cetic / helm-nifi

Helm Chart for Apache Nifi
Apache License 2.0
215 stars 228 forks source link

[cetic/nifi] Nifi cluster not working #268

Closed shayki5 closed 2 years ago

shayki5 commented 2 years ago

Describe the bug When I the replicas is 1 the nifi is working, I can login ang even connect to the nifi-registry. When I increased the replicas to 2 (or more) I can't login the Nifi UI, got several errors. I searched all the issues here but without success. If ca is enabled in my values file, I can see this error in the server pod:

WARN [Process Cluster Protocol Request-32] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi-1.nifi-headless.data-tools.svc.cluster.local due to Received fatal alert: certificate_unknown

or:

WARN [Clustering Tasks Thread-1] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'HEARTBEAT' protocol message

if certManager is enabled, I can see this error in the server pod:

Failed to create socket to nifi-0.nifi-headless.data-tools.svc.******.com:6007 due to: java.net.ConnectException: Connection refused (Connection refused)

or:

Caused by: org.springframework.security.oauth2.jwt.BadJwtException: An error occurred while attempting to decode the Jwt: Signed JWT rejected: Another algorithm expected, or no matching key(s) found
        at org.springframework.security.oauth2.jwt.NimbusJwtDecoder.createJwt(NimbusJwtDecoder.java:180)
        at org.springframework.security.oauth2.jwt.NimbusJwtDecoder.decode(NimbusJwtDecoder.java:137)
        at org.springframework.security.oauth2.server.resource.authentication.JwtAuthenticationProvider.getJwt(JwtAuthenticationProvider.java:97)

Version of Helm, Kubernetes and the Nifi chart: Helm: 13.9.0 K8s: 1.21.7 Nifi chart: 1.1.1

What happened: When I the replicas is 1 the nifi is working, I can login ang even connect to the nifi-registry. When I increased the replicas to 2 (or more) I can't login the Nifi UI, got several errors.

What you expected to happen: Can access the Nifi UI.

How to reproduce it (as minimally and precisely as possible): Increased the replicas to 2 (or more)

Anything else we need to know:

Here are some information that help troubleshooting:

image: repository: apache/nifi tag: "1.16.3" pullPolicy: "IfNotPresent"

properties:

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#nifi_sensitive_props_key

sensitiveKey: test123456789 # Must have at least 12 characters isNode: true

auth: admin: CN=admin, OU=NIFI SSL: keystorePasswd: changeMe truststorePasswd: changeMe

Automaticaly disabled if OIDC or LDAP enabled

singleUser: username: admin password: TestingNifi123 # Must to have at least 12 characters

Configure Ingress based on the documentation here: https://kubernetes.io/docs/concepts/services-networking/ingress/

ingress: enabled: true

className: nginx

annotations: nginx.ingress.kubernetes.io/upstream-vhost: "localhost:8443" nginx.ingress.kubernetes.io/proxy-redirect-from: "https://localhost:8443" nginx.ingress.kubernetes.io/proxy-redirect-to: "https://nifi.tools.dev.com" kubernetes.io/tls-acme: "true" nginx.ingress.kubernetes.io/backend-protocol: "HTTPS" tls: [] hosts:

Enable persistence using Persistent Volume Claims

ref: http://kubernetes.io/docs/user-guide/persistent-volumes/

persistence: enabled: true

ca server details

Setting this true would create a nifi-toolkit based ca server

The ca server will be used to generate self-signed certificates required setting up secured cluster

ca:

If true, enable the nifi-toolkit certificate authority

enabled: false/true (tried with ca enabled or certManager enabled) persistence: enabled: true server: "" service: port: 9090 token: sixteenCharacters admin: cn: admin serviceAccount: create: false

name: nifi-ca

openshift: scc: enabled: false

cert-manager support

Setting this true will have cert-manager create a private CA for the cluster

as well as the certificates for each cluster node.

certManager: enabled: false/true (tried with ca enabled or certManager enabled) clusterDomain: tools.dev.com keystorePasswd: test123 truststorePasswd: test123 replaceDefaultTrustStore: false additionalDnsNames:

------------------------------------------------------------------------------

Zookeeper:

------------------------------------------------------------------------------

zookeeper:

If true, install the Zookeeper chart

ref: https://github.com/bitnami/charts/blob/master/bitnami/zookeeper/values.yaml

enabled: true

If the Zookeeper Chart is disabled a URL and port are required to connect

url: "" port: 2181 replicaCount: 3

------------------------------------------------------------------------------

Nifi registry:

------------------------------------------------------------------------------

registry:

If true, install the Nifi registry

enabled: true url: "" port: 80 extraEnvs:

* the output of the folowing commands:

Check if a pod is in error: 
```bash
kubectl get pod
NAME                  READY   STATUS    RESTARTS   AGE
cert-manager-1662895698-cainjector-85946b945f-x4pxb   1/1     Running   0          26h
cert-manager-1662895698-controller-6849b8f569-l6jmg   1/1     Running   0          26h
cert-manager-1662895698-webhook-6595cb448b-4q8td      1/1     Running   0          26h
nifi-0                                                4/4     Running   0          9m40s
nifi-1                                                4/4     Running   0          9m40s
nifi-ca-65c89cd6b7-pc6bf                              1/1     Running   0          9m40s
nifi-registry-0                                       1/1     Running   0          9m40s
nifi-zookeeper-0                                      1/1     Running   0          9m40s
nifi-zookeeper-1                                      1/1     Running   0          9m40s
nifi-zookeeper-2                                      1/1     Running   0          9m40s

I'm not familiar with the cert stuff, I guess it's related somehow, I really appreciate your help. Thank you!

shayki5 commented 2 years ago

So... I just install it again in a new fresh namespace, with certManager enabled and it working 🤷‍♂️ Thanks :)

neron-traice commented 1 year ago

@shayki5 Can you share the working values.yaml file ?

shayki5 commented 1 year ago

@neron-traice still want it?

lucasfcnunes commented 1 year ago

@neron-traice still want it?

I do!

shayki5 commented 1 year ago

@lucasfcnunes

# Number of nifi nodes
replicaCount: 3

## Set default image, imageTag, and imagePullPolicy.
## ref: https://hub.docker.com/r/apache/nifi/
##
image:
  repository: apache/nifi
  tag: "1.16.3"
  pullPolicy: "IfNotPresent"

  ## Optionally specify an imagePullSecret.
  ## Secret must be manually created in the namespace.
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  ##
  # pullSecret: myRegistrKeySecretName

securityContext:
  runAsUser: 1000
  fsGroup: 1000

## @param useHostNetwork - boolean - optional
## Bind ports on the hostNetwork. Useful for CNI networking where hostPort might
## not be supported. The ports need to be available on all hosts. It can be
## used for custom metrics instead of a service endpoint.
##
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces are accepted from any host able to connect to this host.
#

sts:
  # Parallel podManagementPolicy for faster bootstrap and teardown. Default is OrderedReady.
  podManagementPolicy: Parallel
  AntiAffinity: soft
  useHostNetwork: null
  hostPort: null
  pod:
    annotations:
      security.alpha.kubernetes.io/sysctls: net.ipv4.ip_local_port_range=10000 65000
      #prometheus.io/scrape: "true"      
  serviceAccount:
    create: false
    #name: nifi
    annotations: {}
  hostAliases: []
#    - ip: "1.2.3.4"
#      hostnames:
#        - example.com
#        - example

  startupProbe:
    enabled: false
    failureThreshold: 60
    periodSeconds: 10

## Useful if using any custom secrets
## Pass in some secrets to use (if required)
# secrets:
# - name: myNifiSecret
#   keys:
#     - key1
#     - key2
#   mountPath: /opt/nifi/secret

## Useful if using any custom configmaps
## Pass in some configmaps to use (if required)
# configmaps:
#   - name: myNifiConf
#     keys:
#       - myconf.conf
#     mountPath: /opt/nifi/custom-config

properties:
  # https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#nifi_sensitive_props_key
  sensitiveKey: test # Must have at least 12 characters
  # NiFi assumes conf/nifi.properties is persistent but this helm chart
  # recreates it every time.  Setting the Sensitive Properties Key
  # (nifi.sensitive.props.key) is supposed to happen at the same time
  # /opt/nifi/data/flow.xml.gz sensitive properties are encrypted.  If that
  # doesn't happen then NiFi won't start because decryption fails.
  # So if sensitiveKeySetFile is configured but doesn't exist, assume
  # /opt/nifi/flow.xml.gz hasn't been encrypted and follow the procedure
  # https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#updating-the-sensitive-properties-key
  # to simultaneously encrypt it and set nifi.sensitive.props.key.
  # sensitiveKeySetFile: /opt/nifi/data/sensitive-props-key-applied
  # If sensitiveKey was already set, then pass in sensitiveKeyPrior with the old key.
  # sensitiveKeyPrior: OldPasswordToChangeFrom
  algorithm: NIFI_PBKDF2_AES_GCM_256
  # use externalSecure for when inbound SSL is provided by nginx-ingress or other external mechanism
  externalSecure: true
  isNode: true
  httpsPort: 8443 
  #webProxyHost: nifi.test.com/nifi:8443 # <clusterIP>:<NodePort> (If Nifi service is NodePort or LoadBalancer)
  clusterPort: 6007
  provenanceStorage: "8 GB"
  siteToSite:
    port: 10000
  # use properties.safetyValve to pass explicit 'key: value' pairs that overwrite other configuration
  safetyValve:
    #nifi.variable.registry.properties: "${NIFI_HOME}/example1.properties, ${NIFI_HOME}/example2.properties"
    nifi.web.http.network.interface.default: eth0
    # listen to loopback interface so "kubectl port-forward ..." works
    nifi.web.http.network.interface.lo: lo
  namespace: nifi

  ## Include aditional processors
  # customLibPath: "/opt/configuration_resources/custom_lib"

## Include additional libraries in the Nifi containers by using the postStart handler
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
# postStart: /opt/nifi/psql; wget -P /opt/nifi/psql https://jdbc.postgresql.org/download/postgresql-42.2.6.jar

# Nifi User Authentication
auth:
  admin: CN=admin, OU=NIFI
  SSL:
    keystorePasswd: changeme
    truststorePasswd: changeme

  # Automaticaly disabled if OIDC or LDAP enabled 
  singleUser:
    username: test@test.com
    password: test # Must to have at least 12 characters

  clientAuth:
    enabled: false

  ldap:
    enabled: true
    host: ldap://test.test:389
    searchBase: ou=test,dc=test,dc=local
    ldapConnectUser: CN=test,OU=Users,OU=test,OU=test,DC=test,DC=local
    ldapConnectPass: _password_ 
    admin: cn=data_eng,ou=ServiceAccounts,ou=test,dc=test,dc=local
    searchFilter: (sAMAccountName={0})
    userIdentityAttribute: cn
    authStrategy: SIMPLE # How the connection to the LDAP server is authenticated. Possible values are ANONYMOUS, SIMPLE, LDAPS, or START_TLS.
    identityStrategy: USE_DN
    authExpiration: 12 hours

  oidc:
    enabled: false

  usersOidc:
    enabled: true

openldap:
  enabled: false
  persistence:
    enabled: true
  env:
    LDAP_ORGANISATION: # name of your organization e.g. "Example"
    LDAP_DOMAIN: # your domain e.g. "ldap.example.be"
    LDAP_BACKEND: "hdb"
    LDAP_TLS: "true"
    LDAP_TLS_ENFORCE: "false"
    LDAP_REMOVE_CONFIG_AFTER_SETUP: "false"
  adminPassword: #ChengeMe
  configPassword: #ChangeMe
  customLdifFiles:
    1-default-users.ldif: |-
      # You can find an example ldif file at https://github.com/cetic/fadi/blob/master/examples/basic/example.ldif
## Expose the nifi service to be accessed from outside the cluster (LoadBalancer service).
## or access it from within the cluster (ClusterIP service). Set the service type and the port to serve it.
## ref: http://kubernetes.io/docs/user-guide/services/
##

# headless service
headless:
  type: ClusterIP
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"

# ui service
service:
  type: ClusterIP
  httpsPort: 8443
  # nodePort: 30236
  annotations: {}
    # loadBalancerIP:
    ## Load Balancer sources
    ## https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/#restrict-access-for-loadbalancer-service
    ##
    # loadBalancerSourceRanges:
    # - 10.10.10.0/24
    ## OIDC authentication requires "sticky" session on the LoadBalancer for JWT to work properly...but AWS doesn't like it on creation
    # sessionAffinity: ClientIP
    # sessionAffinityConfig:
    #   clientIP:
  #     timeoutSeconds: 10800

  # Enables additional port/ports to nifi service for internal processors
  processors:
    enabled: false
    ports:
      - name: processor01
        port: 7001
        targetPort: 7001
        #nodePort: 30701
      - name: processor02
        port: 7002
        targetPort: 7002
        #nodePort: 30702

## Configure Ingress based on the documentation here: https://kubernetes.io/docs/concepts/services-networking/ingress/
##
ingress:
  enabled: true
  # className: nginx
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/upstream-vhost: "localhost:8443"
    nginx.ingress.kubernetes.io/proxy-redirect-from: "https://localhost:8443"
    nginx.ingress.kubernetes.io/proxy-redirect-to: "https://nifi.test.com"
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "hello-cookie"
    nginx.ingress.kubernetes.io/session-cookie-expires: "17280000"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "17280000"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/affinity-mode: persistent
    nginx.ingress.kubernetes.io/session-cookie-hash: sha1
    nginx.ingress.kubernetes.io/proxy-body-size: 50m
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header 'X-ProxyScheme' 'https';
      proxy_set_header 'X-ProxyPort' '443';
  hosts:
    - nifi.test.com 
  path: /
  # If you want to change the default path, see this issue https://github.com/cetic/helm-nifi/issues/22

# Amount of memory to give the NiFi java heap
jvmMemory: 4g

# Separate image for tailing each log separately and checking zookeeper connectivity
sidecar:
  image: busybox
  tag: "1.32.0"
  imagePullPolicy: "IfNotPresent"

## Enable persistence using Persistent Volume Claims
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
##
persistence:
  enabled: true

  # When creating persistent storage, the NiFi helm chart can either reference an already-defined
  # storage class by name, such as "standard" or can define a custom storage class by specifying
  # customStorageClass: true and providing the "storageClass", "storageProvisioner" and "storageType".
  # For example, to use SSD storage on Google Compute Engine see values-gcp.yaml
  #
  # To use a storage class that already exists on the Kubernetes cluster, we can simply reference it by name.
  # For example:
  # storageClass: standard
  #
  # The default storage class is used if this variable is not set.

  accessModes:  [ReadWriteOnce]
  ## Storage Capacities for persistent volumes
  configStorage:
    size: 1Gi
  authconfStorage:
    size: 1Gi
  # Storage capacity for the 'data' directory, which is used to hold things such as the flow.xml.gz, configuration, state, etc.
  dataStorage:
    size: 10Gi
  # Storage capacity for the FlowFile repository
  flowfileRepoStorage:
    size: 50Gi
  # Storage capacity for the Content repository
  contentRepoStorage:
    size: 50Gi
  # Storage capacity for the Provenance repository. When changing this, one should also change the properties.provenanceStorage value above, also.
  provenanceRepoStorage:
    size: 50Gi
  # Storage capacity for nifi logs
  logStorage:
    size: 50Gi

## Configure resource requests and limits
## ref: http://kubernetes.io/docs/user-guide/compute-resources/
##
resources: 
  requests:
    cpu: 100m
    memory: 4Gi
  limits:
    cpu: 2
    memory: 6Gi
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #  cpu: 100m
  #  memory: 128Mi
  # requests:
  #  cpu: 100m
  #  memory: 128Mi

logresources:
  requests:
    cpu: 10m
    memory: 10Mi
  limits:
    cpu: 50m
    memory: 50Mi

## Enables setting your own affinity. Mutually exclusive with sts.AntiAffinity
## You need to set the value of sts.AntiAffinity other than "soft" and "hard"
affinity: {}

nodeSelector:
  app: criticalinfra

tolerations: 
  - key: "app"
    operator: "Equal"
    value: "criticalinfra"

initContainers: {}
  # foo-init:  # <- will be used as container name
  #   image: "busybox:1.30.1"
  #   imagePullPolicy: "IfNotPresent"
  #   command: ['sh', '-c', 'echo this is an initContainer']
  #   volumeMounts:
  #     - mountPath: /tmp/foo
  #       name: foo

extraVolumeMounts: []

extraVolumes: []

## Extra containers
extraContainers: []

terminationGracePeriodSeconds: 30

## Extra environment variables that will be pass onto deployment pods
env: []

## Extra environment variables from secrets and config maps
envFrom: []

# envFrom:
#   - configMapRef:
#       name: config-name
#   - secretRef:
#       name: mysecret

## Openshift support
## Use the following varables in order to enable Route and Security Context Constraint creation
openshift:
  scc:
    enabled: false
  route:
    enabled: false
    #host: www.test.com
    #path: /nifi

# ca server details
# Setting this true would create a nifi-toolkit based ca server
# The ca server will be used to generate self-signed certificates required setting up secured cluster
ca:
  ## If true, enable the nifi-toolkit certificate authority
  enabled: false
  persistence:
    enabled: true
  server: ""
  service:
    port: 9090
  token: sixteenCharacters
  admin:
    cn: admin
  serviceAccount:
    create: false
    #name: nifi-ca
  openshift:
    scc:
      enabled: false

# cert-manager support
# Setting this true will have cert-manager create a private CA for the cluster
# as well as the certificates for each cluster node.
certManager:
  enabled: true
  clusterDomain: cluster.local
  keystorePasswd: changeme
  truststorePasswd: changeme
  replaceDefaultTrustStore: false
  additionalDnsNames:
    - localhost
    - nifi.test.com
  refreshSeconds: 300
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 100m
      memory: 128Mi
  # cert-manager takes care of rotating the node certificates, so default
  # their lifetime to 90 days.  But when the CA expires you may need to 
  # 'helm delete' the cluster, delete all the node certificates and secrets, 
  # and then 'helm install' the NiFi cluster again.  If a site-to-site trusted
  # CA or a NiFi Registry CA certificate expires, you'll need to restart all 
  # pods to pick up the new version of the CA certificate.  So default the CA 
  # lifetime to 10 years to avoid that happening very often.
  # c.f. https://github.com/cert-manager/cert-manager/issues/2478#issuecomment-1095545529
  certDuration: 2160h
  caDuration: 87660h

# ------------------------------------------------------------------------------
# Zookeeper:
# ------------------------------------------------------------------------------
zookeeper:
  ## If true, install the Zookeeper chart
  ## ref: https://github.com/bitnami/charts/blob/master/bitnami/zookeeper/values.yaml
  enabled: true
  ## If the Zookeeper Chart is disabled a URL and port are required to connect
  url: ""
  port: 2181
  replicaCount: 3
  persistence:
    enabled: true
  nodeSelector:
    app: criticalinfra

  tolerations: 
    - key: "app"
      operator: "Equal"
      value: "criticalinfra"

# ------------------------------------------------------------------------------
# Nifi registry:
# ------------------------------------------------------------------------------
nifiregistry:
  ## If true, install the Nifi registry
  enabled: true
  url: ""
  port: 80
  persistence:
    enabled: true
  extraEnvs:
  - name: NIFI_REGISTRY_WEB_HTTP_HOST
    value: "0.0.0.0"
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
      nginx.ingress.kubernetes.io/upstream-vhost: "localhost:18080"
      nginx.ingress.kubernetes.io/proxy-redirect-from: "http://localhost:18080"
      nginx.ingress.kubernetes.io/proxy-redirect-to: "http://nifi-registry.test.com"
      kubernetes.io/tls-acme: "true"
      nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
    hosts:
      - host: nifi-registry.test.com
        paths: 
        - path: /
          pathType: Prefix
  security:
    # Disabled by default (following the principle of least astonishment)
    enabled: false
    needClientAuth: true
    httpsHost: "0.0.0.0"
    httpsPort: 18443
    admin: "Initial Administrator"
    persistence:
    # storageClass: "-"
      accessMode: ReadWriteOnce
      size: 1Gi
    # ConfigMap with users.xml and authorizations.xml keys; note that these
    # settings will override the admin: key above if present
    authConf:

  certManager:
    # If true, use cert-manager to create and rotate intra-NiFi-Registry-cluster
    # TLS keys (note that cert-manager is a Kubernetes cluster-wide resource, so
    # is not installed automatically by this chart); c.f. https://cert-manager.io
    enabled: false
    # TLS Common Name of a client, suitable for using as an initial administrator.
    # The client certificate (including private key) will be in a Kubernetes
    # TLS secret of the name {{ template "nifi-registry.fullname"}}-client
    clientCommonName: "Initial Administrator"
    # Kubernetes cluster top level domain, to generate fully qualified domain names
    # for certificate Common Names
    clusterDomain: cluster.local
    # Java Key Store (JKS) password for NiFi Registry keystore
    keystorePasswd: changeme
    # Java Key Store (JKS) password for NiFi Registry truststore
    truststorePasswd: changeme
    # Additional DNS names to incorporate into TLS certificates (e.g. where users
    # point browsers to access the NiFi Registry UI)
    additionalDnsNames:
      - localhost
      - nifi-registry.test.com
    # Names of Kubernetes secrets containing ca.crt keys to add to the
    # NiFi Registry truststore (e.g. CAs of NiFi Registry clients)
    caSecrets:
    # If your (e.g.) OIDC server is using TLS with a private CA, then set this
    # to true so that Java will use the cert-manager-derived TrustStore:
    replaceDefaultTrustStore: false
    # How often the sidecar refreshes the NiFi keystore and truststore from
    # the cert-manager Kubernetes secrets (and other caSecrets)
    refreshSeconds: 300
    certDuration: 2160h
    caDuration: 87660h

  resources:
    requests:
      memory: "128Mi"
      cpu: "500m"
    limits:
      memory: "4Gi"
      cpu: "500m"

# Configure metrics
metrics:
  prometheus:
    # Enable Prometheus metrics
    enabled: false
    # Port used to expose Prometheus metrics
    port: 9092
    serviceMonitor:
      # Enable deployment of Prometheus Operator ServiceMonitor resource
      enabled: false
      # namespace: monitoring
      # Additional labels for the ServiceMonitor
      labels: {}
karthikeya0502 commented 12 months ago

how are you accessing ui after this?? can you please share

shayki5 commented 11 months ago

Yes, I can access, what do you want I will share? I shared the my values file