Closed menghe999 closed 1 year ago
还有其他错误日志么?你发的那个错误日志,不是主要的,仅会在接入集群时出现,该问题已在master分支上修复。
还有其他错误日志么?你发的那个错误日志,不是主要的,仅会在接入集群时出现,该问题已在master分支上修复。
# tailf log_error.log | grep 'clusterPhyId=1'
2023-05-08 12:08:06.464 [MetricCollect-Shard-1-9-thread-6] ERROR class=c.x.k.s.k.c.s.h.c.c.HealthCheckClusterService||method=checkClusterNoController||param=ClusterPhyParam(clusterPhyId=1)||config=HealthCompareValueConfig(value=1.0)||errMsg=get metrics from es failed, activeControllerCount is null
2023-05-08 12:08:06.511 [MetadataTaskTP-6-thread-11] ERROR class=c.x.k.s.k.t.k.metadata.SyncBrokerConfigDiffTask||method=processSubTask||clusterPhyId=1||data=BrokerConfigPO(clusterPhyId=1, brokerId=66, configName=listeners, configValue=SASL_PLAINTEXT://pa3:6868,, diffType=1)||errMsg=exception!
还有其他错误日志么?你发的那个错误日志,不是主要的,仅会在接入集群时出现,该问题已在master分支上修复。
# tailf log_error.log | grep 'clusterPhyId=1' 2023-05-08 12:08:06.464 [MetricCollect-Shard-1-9-thread-6] ERROR class=c.x.k.s.k.c.s.h.c.c.HealthCheckClusterService||method=checkClusterNoController||param=ClusterPhyParam(clusterPhyId=1)||config=HealthCompareValueConfig(value=1.0)||errMsg=get metrics from es failed, activeControllerCount is null 2023-05-08 12:08:06.511 [MetadataTaskTP-6-thread-11] ERROR class=c.x.k.s.k.t.k.metadata.SyncBrokerConfigDiffTask||method=processSubTask||clusterPhyId=1||data=BrokerConfigPO(clusterPhyId=1, brokerId=66, configName=listeners, configValue=SASL_PLAINTEXT://pa3:6868,, diffType=1)||errMsg=exception!
除了UnknownHost这个错误之外,还有没有连接jmx失败的日志?
还有其他错误日志么?你发的那个错误日志,不是主要的,仅会在接入集群时出现,该问题已在master分支上修复。
# tailf log_error.log | grep 'clusterPhyId=1' 2023-05-08 12:08:06.464 [MetricCollect-Shard-1-9-thread-6] ERROR class=c.x.k.s.k.c.s.h.c.c.HealthCheckClusterService||method=checkClusterNoController||param=ClusterPhyParam(clusterPhyId=1)||config=HealthCompareValueConfig(value=1.0)||errMsg=get metrics from es failed, activeControllerCount is null 2023-05-08 12:08:06.511 [MetadataTaskTP-6-thread-11] ERROR class=c.x.k.s.k.t.k.metadata.SyncBrokerConfigDiffTask||method=processSubTask||clusterPhyId=1||data=BrokerConfigPO(clusterPhyId=1, brokerId=66, configName=listeners, configValue=SASL_PLAINTEXT://pa3:6868,, diffType=1)||errMsg=exception!
除了UnknownHost这个错误之外,还有没有连接jmx失败的日志?
tail -5000f log_error.log | grep 'clusterPhyId=1'
....
2023-05-11 15:48:34.479 [MetricCollect-Shard-2-10-thread-48] ERROR class=c.x.k.s.k.c.s.h.c.topic.HealthCheckTopicService||method=checkTopicUnderReplicatedPartition||param=TopicParam{clusterPhyId=1, topicName='producer-test-DefaultPartitioner-10'}||config=HealthDetectedInLatestMinutesConfig(latestMinutes=10, detectedTimes=8)||result=Result{message='失败', code=1, data=null}||errMsg=search metrics from es failed
2023-05-11 15:48:34.479 [MetricCollect-Shard-2-10-thread-22] ERROR class=c.x.k.s.k.c.s.h.c.topic.HealthCheckTopicService||method=checkTopicUnderReplicatedPartition||param=TopicParam{clusterPhyId=1, topicName='__consumer_offsets'}||config=HealthDetectedInLatestMinutesConfig(latestMinutes=10, detectedTimes=8)||result=Result{message='失败', code=1, data=null}||errMsg=search metrics from es failed
2023-05-11 15:49:04.636 [MetricCollect-Shard-1-9-thread-47] ERROR class=c.x.k.s.k.c.s.h.c.c.HealthCheckClusterService||method=checkClusterNoController||param=ClusterPhyParam(clusterPhyId=1)||config=HealthCompareValueConfig(value=1.0)||errMsg=get metrics from es failed, activeControllerCount is null
基本都是method=checkxxx打印的异常日志。
我有把这个集群注册到logikm v2.6上,topic的流量信息可以正常显示的,是不是可以确认kafka集群配置的没有问题。
顺便我问下我,配置broker的时候填写jmx端口,表里也是对的,但是指标一直获取不到是什么原因呢?单独获取一直显示9099端口,配置的不生效 2023-05-12 09:33:38.990 ERROR 2380 --- [kTP-5-thread-13] c.x.k.s.km.common.jmx.JmxConnectorWrap : JMX connect exception, clientLogIdent:clusterPhyId: 1 brokerId: 2 host:b-2.pre-spot-market.xvod6s.c4.kafka.ap-southeast-1.amazonaws.com port:9099. at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.createJmxConnector(JmxConnectorWrap.java:176) at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.checkJmxConnectionAndInitIfNeed(JmxConnectorWrap.java:74) 2023-05-12 09:33:38.990 ERROR 2380 --- [kTP-5-thread-13] c.x.k.s.k.p.kafka.KafkaJMXClient : method=getClientWithCheck||clusterPhyId=1||brokerId=2||msg=get jmx connector failed! 2023-05-12 09:33:48.974 ERROR 2380 --- [d-1-9-thread-26] c.x.k.s.km.common.jmx.JmxConnectorWrap : JMX connect exception, clientLogIdent:clusterPhyId: 1 brokerId: 1 host:b-1.pre-spot-market.xvod6s.c4.kafka.ap-southeast-1.amazonaws.com port:9099. at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.createJmxConnector(JmxConnectorWrap.java:176) at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.checkJmxConnectionAndInitIfNeed(JmxConnectorWrap.java:74) 2023-05-12 09:33:48.974 ERROR 2380 --- [d-1-9-thread-26] c.x.k.s.k.p.kafka.KafkaJMXClient : method=getClientWithCheck||clusterPhyId=1||brokerId=1||msg=get jmx connector failed!
还有其他错误日志么?你发的那个错误日志,不是主要的,仅会在接入集群时出现,该问题已在master分支上修复。
# tailf log_error.log | grep 'clusterPhyId=1' 2023-05-08 12:08:06.464 [MetricCollect-Shard-1-9-thread-6] ERROR class=c.x.k.s.k.c.s.h.c.c.HealthCheckClusterService||method=checkClusterNoController||param=ClusterPhyParam(clusterPhyId=1)||config=HealthCompareValueConfig(value=1.0)||errMsg=get metrics from es failed, activeControllerCount is null 2023-05-08 12:08:06.511 [MetadataTaskTP-6-thread-11] ERROR class=c.x.k.s.k.t.k.metadata.SyncBrokerConfigDiffTask||method=processSubTask||clusterPhyId=1||data=BrokerConfigPO(clusterPhyId=1, brokerId=66, configName=listeners, configValue=SASL_PLAINTEXT://pa3:6868,, diffType=1)||errMsg=exception!
除了UnknownHost这个错误之外,还有没有连接jmx失败的日志?
tail -5000f log_error.log | grep 'clusterPhyId=1' .... 2023-05-11 15:48:34.479 [MetricCollect-Shard-2-10-thread-48] ERROR class=c.x.k.s.k.c.s.h.c.topic.HealthCheckTopicService||method=checkTopicUnderReplicatedPartition||param=TopicParam{clusterPhyId=1, topicName='producer-test-DefaultPartitioner-10'}||config=HealthDetectedInLatestMinutesConfig(latestMinutes=10, detectedTimes=8)||result=Result{message='失败', code=1, data=null}||errMsg=search metrics from es failed 2023-05-11 15:48:34.479 [MetricCollect-Shard-2-10-thread-22] ERROR class=c.x.k.s.k.c.s.h.c.topic.HealthCheckTopicService||method=checkTopicUnderReplicatedPartition||param=TopicParam{clusterPhyId=1, topicName='__consumer_offsets'}||config=HealthDetectedInLatestMinutesConfig(latestMinutes=10, detectedTimes=8)||result=Result{message='失败', code=1, data=null}||errMsg=search metrics from es failed 2023-05-11 15:49:04.636 [MetricCollect-Shard-1-9-thread-47] ERROR class=c.x.k.s.k.c.s.h.c.c.HealthCheckClusterService||method=checkClusterNoController||param=ClusterPhyParam(clusterPhyId=1)||config=HealthCompareValueConfig(value=1.0)||errMsg=get metrics from es failed, activeControllerCount is null
基本都是method=checkxxx打印的异常日志。
我有把这个集群注册到logikm v2.6上,topic的流量信息可以正常显示的,是不是可以确认kafka集群配置的没有问题。
配置既然没有 问题,那么看一下es/es.log,看看查询es是否有异常。
顺便我问下我,配置broker的时候填写jmx端口,表里也是对的,但是指标一直获取不到是什么原因呢?单独获取一直显示9099端口,配置的不生效 2023-05-12 09:33:38.990 ERROR 2380 --- [kTP-5-thread-13] c.x.k.s.km.common.jmx.JmxConnectorWrap : JMX connect exception, clientLogIdent:clusterPhyId: 1 brokerId: 2 host:b-2.pre-spot-market.xvod6s.c4.kafka.ap-southeast-1.amazonaws.com port:9099. at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.createJmxConnector(JmxConnectorWrap.java:176) at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.checkJmxConnectionAndInitIfNeed(JmxConnectorWrap.java:74) 2023-05-12 09:33:38.990 ERROR 2380 --- [kTP-5-thread-13] c.x.k.s.k.p.kafka.KafkaJMXClient : method=getClientWithCheck||clusterPhyId=1||brokerId=2||msg=get jmx connector failed! 2023-05-12 09:33:48.974 ERROR 2380 --- [d-1-9-thread-26] c.x.k.s.km.common.jmx.JmxConnectorWrap : JMX connect exception, clientLogIdent:clusterPhyId: 1 brokerId: 1 host:b-1.pre-spot-market.xvod6s.c4.kafka.ap-southeast-1.amazonaws.com port:9099. at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.createJmxConnector(JmxConnectorWrap.java:176) at com.xiaojukeji.know.streaming.km.common.jmx.JmxConnectorWrap.checkJmxConnectionAndInitIfNeed(JmxConnectorWrap.java:74) 2023-05-12 09:33:48.974 ERROR 2380 --- [d-1-9-thread-26] c.x.k.s.k.p.kafka.KafkaJMXClient : method=getClientWithCheck||clusterPhyId=1||brokerId=1||msg=get jmx connector failed!
[ ] 我已经在 issues 搜索过相关问题了,并没有重复的。
你是否希望来认领这个Bug。
「 Y / N 」
环境信息
重现该问题的步骤
添加一个kafka集群(已经开启kerberos jmx)
数据库中该集群的信息
预期结果
我希望平台能够监控到kafka集群的topic详细信息
实际结果
jmx端口正常
无法获取topic详情
如果有异常,请附上异常Trace: