confluentinc / confluent-kubernetes-examples

Example scenario workflows for Confluent for Kubernetes
Apache License 2.0
43 stars 175 forks source link

Connect startup error with confluent-platform-zookeeper-7.6.0.yaml #302

Closed knut-bw closed 4 months ago

knut-bw commented 5 months ago

I am trying to install Confluent Platform using the confluent-platform-zookeeper-7.6.0.yaml configuration file Here is my YAML configuration:

apiVersion: platform.confluent.io/v1beta1
kind: Zookeeper
metadata:
  name: zookeeper
spec:
  replicas: 1
  image:
    application: confluentinc/cp-zookeeper:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  dataVolumeCapacity: 10Gi
  logVolumeCapacity: 10Gi
  storageClass:
    name: zookeeper-sc
  podTemplate:
    envVars:
    - name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
      value: "60000"
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values: 
                  - 10.20.1.232
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsNonRoot: true
---
apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  replicas: 1
  image:
    application: confluentinc/cp-server:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  dataVolumeCapacity: 10Gi
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/hostname"
                  operator: In
                  values: 
                  - 10.20.1.232
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsNonRoot: true
---
apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
  name: connect
spec:
  replicas: 2
  image:
    application: confluentinc/cp-server-connect:latest
    init: confluentinc/confluent-init-container:2.8.0
  dependencies:
    kafka:
      bootstrapEndpoint: kafka:9071
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/hostname"
                  operator: In
                  values: 
                  - 10.20.1.232
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsNonRoot: true  
---
apiVersion: platform.confluent.io/v1beta1
kind: KsqlDB
metadata:
  name: ksqldb
spec:
  replicas: 2
  image:
    application: confluentinc/cp-ksqldb-server:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  dataVolumeCapacity: 10Gi
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/hostname"
                  operator: In
                  values: 
                  - 10.20.1.232
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsNonRoot: true
---
apiVersion: platform.confluent.io/v1beta1
kind: ControlCenter
metadata:
  name: controlcenter
spec:
  replicas: 1
  image:
    application: confluentinc/cp-enterprise-control-center:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  dataVolumeCapacity: 10Gi
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/hostname"
                  operator: In
                  values: 
                  - 10.20.1.232
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsNonRoot: true
  dependencies:
    schemaRegistry:
      url: http://schemaregistry.confluent.svc.cluster.local:8081
    ksqldb:
    - name: ksql-dev
      url: http://ksqldb.confluent.svc.cluster.local:8088
    connect:
    - name: connect-dev
      url:  http://connect.confluent.svc.cluster.local:8083
---
apiVersion: platform.confluent.io/v1beta1
kind: SchemaRegistry
metadata:
  name: schemaregistry
spec:
  replicas: 2
  image:
    application: confluentinc/cp-schema-registry:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/hostname"
                  operator: In
                  values: 
                  - 10.20.1.232
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsNonRoot: true
---
apiVersion: platform.confluent.io/v1beta1
kind: KafkaRestProxy
metadata:
  name: kafkarestproxy
spec:
  dependencies:
    schemaRegistry:
      url: http://schemaregistry.confluent.svc.cluster.local:8081
  image:
    application: confluentinc/cp-kafka-rest:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  replicas: 2
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
              - matchExpressions:
                - key: "kubernetes.io/hostname"
                  operator: In
                  values: 
                  - 10.20.1.232
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsNonRoot: true
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-zookeeper-0
spec:
  claimRef:
    name: data-zookeeper-0
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /mnt/data/data-zookeeper-0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.232
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: txnlog-zookeeper-0
spec:
  claimRef:
    name: txnlog-zookeeper-0
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /mnt/data/txnlog-zookeeper-0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.232
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-zookeeper-1
spec:
  claimRef:
    name: data-zookeeper-1
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /mnt/data/data-zookeeper-1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.231
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: txnlog-zookeeper-1
spec:
  claimRef:
    name: txnlog-zookeeper-1
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /mnt/data/txnlog-zookeeper-1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.232
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data0-controlcenter-0
spec:
  claimRef:
    name: data0-controlcenter-0
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: zookeeper-sc
  local:
    path: /mnt/data/data0-controlcenter-0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.232
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-ksqldb-0
spec:
  claimRef:
    name: data-ksqldb-0
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: zookeeper-sc
  local:
    path: /mnt/data/data-ksqldb-0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.232
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data0-kafka-0
spec:
  claimRef:
    name: data0-kafka-0
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: zookeeper-sc
  local:
    path: /mnt/data/data0-kafka-0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.232
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: data0-kafka-1
spec:
  claimRef:
    name: data0-kafka-1
    capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: zookeeper-sc
  local:
    path: /mnt/data/data0-kafka-1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 10.20.1.232

When I start Connect, I encounter the following error:

[ERROR] 2024-05-28 08:27:46,321 [main] org.apache.kafka.connect.runtime.isolation.ReflectionScanner getPluginDesc - Failed to discover Converter in /usr/share/java/confluent-metadata-service: Unable to instantiate JsonConverter: Failed to invoke plugin constructor java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at org.apache.kafka.connect.runtime.isolation.ReflectionScanner.versionFor(ReflectionScanner.java:73) at org.apache.kafka.connect.runtime.isolation.ReflectionScanner.getPluginDesc(ReflectionScanner.java:136) at org.apache.kafka.connect.runtime.isolation.ReflectionScanner.scanPlugins(ReflectionScanner.java:89) at org.apache.kafka.connect.runtime.isolation.PluginScanner.scanUrlsAndAddPlugins(PluginScanner.java:79) at org.apache.kafka.connect.runtime.isolation.PluginScanner.discoverPlugins(PluginScanner.java:67) at org.apache.kafka.connect.runtime.isolation.Plugins.initLoaders(Plugins.java:91) at org.apache.kafka.connect.runtime.isolation.Plugins.(Plugins.java:75) at org.apache.kafka.connect.runtime.isolation.Plugins.(Plugins.java:64) at org.apache.kafka.connect.cli.AbstractConnectCli.startConnect(AbstractConnectCli.java:128) at org.apache.kafka.connect.cli.AbstractConnectCli.run(AbstractConnectCli.java:101) at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:116) Caused by: java.lang.LinkageError: loader constraint violation: when resolving method 'com.fasterxml.jackson.databind.ObjectMapper com.fasterxml.jackson.databind.ObjectMapper.enable(com.fasterxml.jackson.core.JsonParser$Feature[])' the class loader org.apache.kafka.connect.runtime.isolation.PluginClassLoader @1827a871 of the current class, org/apache/kafka/connect/json/JsonDeserializer, and the class loader 'app' for the method's defining class, com/fasterxml/jackson/databind/ObjectMapper, have different Class objects for the type [Lcom/fasterxml/jackson/core/JsonParser$Feature; used in the signature (org.apache.kafka.connect.json.JsonDeserializer is in unnamed module of loader org.apache.kafka.connect.runtime.isolation.PluginClassLoader @1827a871, parent loader org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader @5d8445d7; com.fasterxml.jackson.databind.ObjectMapper is in unnamed module of loader 'app') at org.apache.kafka.connect.json.JsonDeserializer.(JsonDeserializer.java:55) at org.apache.kafka.connect.json.JsonConverter.(JsonConverter.java:244) ... 15 more [ERROR] 2024-05-28 08:27:46,321 [main] org.apache.kafka.connect.runtime.isolation.ReflectionScanner getPluginDesc - Failed to discover HeaderConverter in /usr/share/java/confluent-metadata-service: Unable to instantiate JsonConverter: Failed to invoke plugin constructor java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at org.apache.kafka.connect.runtime.isolation.ReflectionScanner.versionFor(ReflectionScanner.java:73) at org.apache.kafka.connect.runtime.isolation.ReflectionScanner.getPluginDesc(ReflectionScanner.java:136) at org.apache.kafka.connect.runtime.isolation.ReflectionScanner.scanPlugins(ReflectionScanner.java:90) at org.apache.kafka.connect.runtime.isolation.PluginScanner.scanUrlsAndAddPlugins(PluginScanner.java:79) at org.apache.kafka.connect.runtime.isolation.PluginScanner.discoverPlugins(PluginScanner.java:67) at org.apache.kafka.connect.runtime.isolation.Plugins.initLoaders(Plugins.java:91) at org.apache.kafka.connect.runtime.isolation.Plugins.(Plugins.java:75) at org.apache.kafka.connect.runtime.isolation.Plugins.(Plugins.java:64) at org.apache.kafka.connect.cli.AbstractConnectCli.startConnect(AbstractConnectCli.java:128) at org.apache.kafka.connect.cli.AbstractConnectCli.run(AbstractConnectCli.java:101) at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:116) Caused by: java.lang.LinkageError: loader constraint violation: when resolving method 'com.fasterxml.jackson.databind.ObjectMapper com.fasterxml.jackson.databind.ObjectMapper.enable(com.fasterxml.jackson.core.JsonParser$Feature[])' the class loader org.apache.kafka.connect.runtime.isolation.PluginClassLoader @1827a871 of the current class, org/apache/kafka/connect/json/JsonDeserializer, and the class loader 'app' for the method's defining class, com/fasterxml/jackson/databind/ObjectMapper, have different Class objects for the type [Lcom/fasterxml/jackson/core/JsonParser$Feature; used in the signature (org.apache.kafka.connect.json.JsonDeserializer is in unnamed module of loader org.apache.kafka.connect.runtime.isolation.PluginClassLoader @1827a871, parent loader org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader @5d8445d7; com.fasterxml.jackson.databind.ObjectMapper is in unnamed module of loader 'app') at org.apache.kafka.connect.json.JsonDeserializer.(JsonDeserializer.java:55) at org.apache.kafka.connect.json.JsonConverter.(JsonConverter.java:244) ... 15 more

I haven't found any similar issues online, so I'm at a loss for how to proceed Could you please help me troubleshoot this issue?

MedAzizTousli commented 5 months ago

I am having the exact same issue. The funny thing is that it was working fine yesterday. I added few load balancers here and there (for Kafka, Schema, Connect, and ControlCenter), and it does not work anymore.

MedAzizTousli commented 5 months ago

Ok, I think I figured it out. For some reason, if you choose Kafka replicas to be 1 in your Kafka CRD deployment, Kafka Connect CRD deployment does not adapt its default configuration values, and thus creates the error. Try to add the following configuration overrides to your Kafka Connect CRD deployment:

config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1

Your final Kafka Connect CRD deployment should look similar to this:

apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
  name: connect
  namespace: staging-kafka
spec:
  replicas: 1
  image:
    application: confluentinc/cp-server-connect:7.6.1
    init: confluentinc/confluent-init-container:2.8.2
  configOverrides:
    server:
      - config.storage.replication.factor=1
      - offset.storage.replication.factor=1
      - status.storage.replication.factor=1

Additional Note

Make sure to do the same thing for Control Center CRD Deployment since it suffers from the same issue.

apiVersion: platform.confluent.io/v1beta1
kind: ControlCenter
metadata:
  name: controlcenter
  namespace: staging-kafka
spec:
  dataVolumeCapacity: 1Gi
  replicas: 1
  image:
    application: confluentinc/cp-enterprise-control-center:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  dependencies:
    schemaRegistry:
      url: http://schemaregistry:8081
    ksqldb:
    - name: ksqldb
      url: http://ksqldb:8088
    connect:
    - name: connect
      url: http://connect:8083
  configOverrides:
    server:
      - confluent.controlcenter.internal.topics.replication=1
      - confluent.controlcenter.command.topic.replication=1
      - confluent.monitoring.interceptor.topic.replication=1
      - confluent.metrics.topic.replication=1
knut-bw commented 4 months ago

Ok, I think I figured it out. For some reason, if you choose Kafka replicas to be 1 in your Kafka CRD deployment, Kafka Connect CRD deployment does not adapt its default configuration values, and thus creates the error. Try to add the following configuration overrides to your Kafka Connect CRD deployment:

config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1

Your final Kafka Connect CRD deployment should look similar to this:

apiVersion: platform.confluent.io/v1beta1
kind: Connect
metadata:
  name: connect
  namespace: staging-kafka
spec:
  replicas: 1
  image:
    application: confluentinc/cp-server-connect:7.6.1
    init: confluentinc/confluent-init-container:2.8.2
  configOverrides:
    server:
      - config.storage.replication.factor=1
      - offset.storage.replication.factor=1
      - status.storage.replication.factor=1

Additional Note

Make sure to do the same thing for Control Center CRD Deployment since it suffers from the same issue.

apiVersion: platform.confluent.io/v1beta1
kind: ControlCenter
metadata:
  name: controlcenter
  namespace: staging-kafka
spec:
  dataVolumeCapacity: 1Gi
  replicas: 1
  image:
    application: confluentinc/cp-enterprise-control-center:7.6.0
    init: confluentinc/confluent-init-container:2.8.0
  dependencies:
    schemaRegistry:
      url: http://schemaregistry:8081
    ksqldb:
    - name: ksqldb
      url: http://ksqldb:8088
    connect:
    - name: connect
      url: http://connect:8083
  configOverrides:
    server:
      - confluent.controlcenter.internal.topics.replication=1
      - confluent.controlcenter.command.topic.replication=1
      - confluent.monitoring.interceptor.topic.replication=1
      - confluent.metrics.topic.replication=1

Thank you, this solution was very helpful for me.