didi / KnowStreaming

一站式云原生实时流数据平台,通过0侵入、插件化构建企业级Kafka服务,极大降低操作、存储和管理实时流数据门槛
https://knowstreaming.com
GNU Affero General Public License v3.0
6.9k stars 1.28k forks source link

How to troubleshoot that the cluster connected to logikm cannot collect topic and other data #506

Closed moluzhui closed 1 year ago

moluzhui commented 2 years ago

I use logikm to access a cluster. It can be displayed in the list of clusters under operation and maintenance control, but, when viewing cluster details, no any cluster metrics are displayed on all pages of the cluster.

At the same time, when viewing the controller information button, a zookeeper connect failed error pops up, and the console view output is as follows

GET ...../api/v1/rd/clusters/3/controller-preferred-candidates
response: 
{"data":null,"message":"zookeeper connect failed","tips":null,"code":8020}

The logikm error log file log_error_2022-07-18.0.log part is as follows

2022-07-18 23:58:45.025 [pool-12-thread-9] ERROR c.x.k.m.t.s.metadata.FlushBKConsumerGroupMetadata - collect consumerGroup failed, clusterId:3.
java.lang.RuntimeException: Request METADATA failed on brokers List(xx.xxx.xx.xx:9092 (id: -2 rack: null), xx.xxx.xx.xxx:9092 (id: -1 rack: null))
        at kafka.admin.AdminClient.sendAnyNode(AdminClient.scala:66)
        at kafka.admin.AdminClient.findAllBrokers(AdminClient.scala:90)
        at kafka.admin.AdminClient.listAllGroups(AdminClient.scala:98)
        at com.xiaojukeji.kafka.manager.task.schedule.metadata.FlushBKConsumerGroupMetadata.collectAndSaveConsumerGroup(FlushBKConsumerGroupMetadata.java:80)
        at com.xiaojukeji.kafka.manager.task.schedule.metadata.FlushBKConsumerGroupMetadata.flush(FlushBKConsumerGroupMetadata.java:55)
        at com.xiaojukeji.kafka.manager.task.schedule.metadata.FlushBKConsumerGroupMetadata.schedule(FlushBKConsumerGroupMetadata.java:43)
        at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
        at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-07-18 23:59:25.349 [pool-12-thread-12] ERROR c.x.k.m.t.schedule.metadata.FlushTopicProperties - flush topic properties, get zk config failed, clusterId:8.
2022-07-18 23:59:35.002 [TaskThreadPool-1-179] ERROR c.x.k.m.t.s.metadata.FlushZKConsumerGroupMetadata - collect topicName and consumerGroup failed, clusterId:1 consumerGroup:monitor.metric.analyze.
com.xiaojukeji.kafka.manager.common.exception.ConfigException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/monitor.metric.analyze/offsets
        at com.xiaojukeji.kafka.manager.common.zookeeper.ZkConfigImpl.getChildren(ZkConfigImpl.java:362)
        at com.xiaojukeji.kafka.manager.task.schedule.metadata.FlushZKConsumerGroupMetadata$1.call(FlushZKConsumerGroupMetadata.java:95)
        at com.xiaojukeji.kafka.manager.task.schedule.metadata.FlushZKConsumerGroupMetadata$1.call(FlushZKConsumerGroupMetadata.java:91)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/monitor.metric.analyze/offsets
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:200)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38)
        at com.xiaojukeji.kafka.manager.common.zookeeper.ZkConfigImpl.getChildren(ZkConfigImpl.java:360)
        ... 8 common frames omitted

I would like to ask why this exception occurs and look forward to your reply

logikm version: v2.6.0 This exception has been going on for a few days and be able to access the kafka cluster by other means

ZQKC commented 2 years ago

{"data":null,"message":"zookeeper connect failed","tips":null,"code":8020}

  1. The LogiKM read most Kafka Metadata from zookeeper, due to this error message, you can check the LogiKM connects to kafka zookeeper available first.
  2. and about the “Request METADATA failed” error log, LogiKM only support the kafka version >= 0.10.2, you can check the kafka version first.
  3. finally about the "KeeperErrorCode = NoNode for /consumers/monitor.metric.analyze/offsets" error log, you can check wether the zookeeper node which used to record the consumer client consumption progress existed.
ZQKC commented 1 year ago

without more feedback, and close the issue