apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.05k stars 167 forks source link

[BUG] pulsar cluster pulsar-proxy crash and bookies-recovery always init create serviceRefs zookeeper cluster #7040

Open JashBook opened 5 months ago

JashBook commented 5 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. create zk cluster
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    name: zookeeperp-cluster
    namespace: default
    spec:
    clusterDefinitionRef: pulsar-zookeeper
    clusterVersionRef: pulsar-3.0.2
    terminationPolicy: Delete
    affinity:
    podAntiAffinity: Preferred
    topologyKeys:
      - kubernetes.io/hostname
    tenancy: SharedNode
    tolerations:
    - key: kb-data
      operator: Equal
      value: "true"
      effect: NoSchedule
    componentSpecs:
    - name: zookeeper
      componentDefRef: zookeeper
      monitor: false
      replicas: 3
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      volumeClaimTemplates:
        - name: data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  2. create pulsar cluster
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:    
    labels:
    clusterdefinition.kubeblocks.io/name: pulsar
    clusterversion.kubeblocks.io/name: pulsar-3.0.2
    name: pulsar-cluster
    namespace: default
    spec:
    clusterDefinitionRef: pulsar
    clusterVersionRef: pulsar-3.0.2
    componentSpecs:
    - componentDefRef: pulsar-broker
    monitor: false
    name: pulsar-broker
    replicas: 3
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceAccountName: kb-pulsar-cluster
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
    - componentDefRef: pulsar-proxy
    monitor: true
    name: pulsar-proxy
    replicas: 1
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
    - componentDefRef: bookies
    monitor: true
    name: bookies
    replicas: 3
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: journal
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
    - name: ledgers
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
    - componentDefRef: bookies-recovery
    monitor: true
    name: bookies-recovery
    replicas: 1
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
    services:
    - componentSelector: proxy
    name: proxy
    serviceName: proxy
    spec:
      ports:
      - name: pulsar
        port: 6650
        protocol: TCP
        targetPort: 6650
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
      type: ClusterIP
    - componentSelector: broker
    name: broker-bootstrap
    serviceName: broker-bootstrap
    spec:
      ports:
      - name: pulsar
        port: 6650
        protocol: TCP
        targetPort: 6650
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
      - name: kafka-client
        port: 9092
        protocol: TCP
        targetPort: 9092
      type: ClusterIP
    terminationPolicy: Delete
    tolerations:
    - effect: NoSchedule
    key: kb-data
    operator: Equal
    value: "true"
  3. See error
    kubectl get pod 
    NAME                                READY   STATUS             RESTARTS       AGE
    pulsar-cluster-bookies-0            2/2     Running            0              9m47s
    pulsar-cluster-bookies-1            2/2     Running            0              9m47s
    pulsar-cluster-bookies-2            2/2     Running            0              9m47s
    pulsar-cluster-bookies-recovery-0   0/2     Init:0/1           0              9m51s
    pulsar-cluster-pulsar-broker-0      3/3     Running            0              9m46s
    pulsar-cluster-pulsar-broker-1      3/3     Running            0              9m46s
    pulsar-cluster-pulsar-broker-2      3/3     Running            0              9m46s
    pulsar-cluster-pulsar-proxy-0       1/2     CrashLoopBackOff   5 (108s ago)   9m50s
    zookeeperp-cluster-zookeeper-0      2/2     Running            0              9m52s
    zookeeperp-cluster-zookeeper-1      2/2     Running            0              9m52s
    zookeeperp-cluster-zookeeper-2      2/2     Running            0              9m52s

    logs CrashLoopBackOff pod pulsar-proxy serviceRefs not effective zk endpoint "pulsar-cluster-zookeeper.default.svc:2181"

    
    kubectl logs pulsar-cluster-pulsar-proxy-0 proxy --tail 30
    [conf/proxy.conf] Updating config statusFilePath=/pulsar/status
    [conf/proxy.conf] Adding config: maxMessageSize=5242880
    [conf/proxy.conf] Applying config brokerServiceURL = pulsar://pulsar-cluster-pulsar-broker:6650
    [conf/proxy.conf] Applying config brokerWebServiceURL = http://pulsar-cluster-pulsar-broker:80
    [conf/proxy.conf] Applying config clusterName = default-pulsar-cluster-pulsar-proxy
    [conf/proxy.conf] Applying config metadataStoreUrl = pulsar-cluster-zookeeper.default.svc:2181
    [conf/proxy.conf] Applying config webServicePort = 8080
    VM settings:
    Max. Heap Size (Estimated): 154.00M
    Using VM: OpenJDK 64-Bit Server VM

2024-04-12T07:25:10,862+0000 [main] INFO org.apache.pulsar.broker.authentication.AuthenticationService - Authentication is disabled 2024-04-12T07:25:11,360+0000 [main] INFO org.apache.pulsar.proxy.extensions.ProxyExtensionsUtils - Searching for extensions in /pulsar/./proxyextensions 2024-04-12T07:25:11,360+0000 [main] WARN org.apache.pulsar.proxy.extensions.ProxyExtensionsUtils - extension directory not found 2024-04-12T07:25:11,456+0000 [main] INFO org.eclipse.jetty.util.log - Logging initialized @4392ms to org.eclipse.jetty.util.log.Slf4jLog 2024-04-12T07:25:11,761+0000 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.8.3-6ad6d364c7c0bcf0de452d54ebefa3058098ab56, built on 2023-10-05 10:34 UTC 2024-04-12T07:25:11,761+0000 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name=pulsar-cluster-pulsar-proxy-0.pulsar-cluster-pulsar-proxy-headless.default.svc.cluster.local 2024-04-12T07:25:11,761+0000 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.version=17.0.7 2024-04-12T07:25:11,761+0000 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Debian 2024-04-12T07:25:11,761+0000 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-17-openjdk-arm64 ... at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?] at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?] at java.net.InetAddress.getAllByName(InetAddress.java:1385) ~[?:?] at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?] at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1204) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] 2024-04-12T07:22:13,061+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server pulsar-cluster-zookeeper.default.svc/:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. java.lang.IllegalArgumentException: Unable to canonicalize address pulsar-cluster-zookeeper.default.svc/:2181 because it's not resolvable at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1157) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] 2024-04-12T07:22:14,163+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] ERROR org.apache.zookeeper.client.StaticHostProvider - Unable to resolve address: pulsar-cluster-zookeeper.default.svc/:2181 java.net.UnknownHostException: pulsar-cluster-zookeeper.default.svc at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?] at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?] at java.net.InetAddress.getAllByName(InetAddress.java:1385) ~[?:?] at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?] at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1204) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] 2024-04-12T07:22:14,163+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server pulsar-cluster-zookeeper.default.svc/:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. java.lang.IllegalArgumentException: Unable to canonicalize address pulsar-cluster-zookeeper.default.svc/:2181 because it's not resolvable at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1157) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]


logs bookies-recovery

kubectl logs pulsar-cluster-bookies-recovery-0 check-bookies --tail 30

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

github-actions[bot] commented 4 months ago

This issue has been marked as stale because it has been open for 30 days with no activity