Morningstar / kafka-offset-monitor

A small web app to monitor the progress of kafka consumers and their lag wrt the log.
Apache License 2.0
282 stars 108 forks source link

committed offset doesn't updated after some period of times #28

Closed dunamu-maru closed 6 years ago

dunamu-maru commented 6 years ago

I'm using apache kafka 1.0.0, and uses KafkaStreams to consume topics and produce. When I launch kafka-offset-monitor, it traces committed offset and log size well. but after some period of time(not specific, around 3-4 hours later) it stops tracing committed offset and lag increases infinitely.

When I restart offset monitor, it traces committed offset of topics normally again. It does not happened when I use Kafka consumer with Apache Kafka 0.8.2. Is it matter of version of kafka I am using? or should I make versions up kafka-offset-monitor using manually?

I couldn't find any suspicious logs that may have information about this situation.

rcasey212 commented 6 years ago

Hi @dunamu-maru,

I have run this application for many months without restarting it and have not seen this issue with Kafka versions 0.9.0.0 - 0.10.0.1. I cannot speak to Kafka version 1.0. It is currently untested.

Just to be clear, the webpage does not auto-update. You will have to refresh it to see new data. If that is not the problem, please proceed to the following:

To better understand what is happening, can you please:

  1. Tell me the JRE you are using
  2. Provide log files that covers the time when the application stops displaying committed offsets.
  3. Provide all of the command-line options you are using when starting the program
  4. Tell which version of Zookeeper you are using
Laxman-SM commented 6 years ago

Hi @rcasey212 , i used older version of this tools for kafka monitoring that time i have not used any authentication mechanisms. is this functionality introduced recently. i saw this article today regarding g setup for Authorization. is this functionality is prerequisite with new version ? https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/

dunamu-maru commented 6 years ago

Oh I found that I didn't specify all of brokers at kafkaOffsetMonitoring execution parameters. I'm so sorry for annoying you because of my carelessness. thanks for your response!

dunamu-maru commented 6 years ago

Wait, I found same issue even after updating parameters ;) I will attatch contents you asked below.

  1. JRE version

    openjdk version "1.8.0_111"
    OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2~bpo8+1-b14)
    OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
  2. unfortunately, I didn't save logs. I started to collect logs about a hour ago. I will update this answer later(when same issue occurs). finally I could get logs but.. this logs only ;(

...
2018-04-02 08:27:36 INFO  KafkaOffsetGetter$:68 - Updating committed offset: g:my_private_consumer_group,t:MyTopic1,p:0: 931345
2018-04-02 08:27:36 INFO  KafkaOffsetGetter$:68 - Updating committed offset: g:my_private_consumer_group,t:MyTopic1,p:1: 925623
2018-04-02 08:27:36 INFO  KafkaOffsetGetter$:68 - Updating committed offset: g:my_private_consumer_group,t:MyTopic1,p:4: 930451
2018-04-02 08:27:36 INFO  KafkaOffsetGetter$:68 - Updating committed offset: g:my_private_consumer_group,t:MyTopic1,p:0: 931358
2018-04-02 08:27:36 INFO  KafkaOffsetGetter$:68 - Updating committed offset: g:my_private_consumer_group,t:MyTopic1,p:2: 923093

about that time, some of brokers have logs like this.

[2018-04-02 08:27:30,435] INFO Rolled new log segment for '__transaction_state-23' in 1 ms. (kafka.log.Log)
[2018-04-02 08:27:36,709] INFO Rolled new log segment for '__consumer_offsets-27' in 1 ms. (kafka.log.Log)

I thought this might be related with this issue.. but not sure.

  1. command-line options
    java -cp /usr/src/app/app.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb --offsetStorage kafka --port 8080 --kafkaBrokers kafka-0.broker:9092,kafka-1.broker:9092,kafka-2.broker:9092,kafka-3.broker:9092,kafka-4.broker:9092 --zk zk-0.zookeeper:2181,zk-1.zookeeper:2181,zk-2.zookeeper:2181 --refresh 30.seconds --retain 2.days
  2. zookeeper-3.4.10

additionally when I enter servers that kafka is running and execute kafka-consumer-group script, it shows that consumer groups normally consume logs and committed offset equals to lastest committed offset.

rcasey212 commented 6 years ago

Unfortunately I'm unable to understand the issue based on the information you gave. I was hoping to see some kind of exception from the log. The info you posted all looks good. If you get more log information, please post it.

rcasey212 commented 6 years ago

Hi @Laxman-SM,

Authentication/authorization is not a prerequisite to use newer versions of Kafka, or this offset monitoring tool.

Laxman-SM commented 6 years ago

HI @rcasey212 now i am able to get offset with tool. in previous version only zookeeper node required to start the Kafka offset monitor. in current version i saw Kafka broker also required.

IP replaced with xx

java -cp KafkaOffsetMonitor-assembly-0.4.6-SNAPSHOT.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb --offsetStorage kafka --port 8181 --kafkaBrokers ip-xx-xx-xx-112.ec2.internal:9092,ip-xx-xxx-xx-227.ec2.internal:9092 --zk ip-xx-xxx-xx-227.ec2.internal:2181,ip-xx-xxx-xx-112.ec2.internal:2181 --refresh 30.seconds --retain 2.days

chart 2

thanks you for you all support.

dunamu-maru commented 6 years ago

Hi @rcasey212 Finally I found that my application that using KafkaStreams didn't specify two of five brokers' address in application configuration file. I fixed it and restarted my application and it works fine. I should have inspected my application first. anyway, I'm so appreciated for your support!

dunamu-maru commented 6 years ago

I found some abnormal situation like this. After all consumer's config fixed precisely, it seemed to work properly. but same issue occurred some period of time after.

I did some inspection and I found that consumer-id didn't refreshed at offsetMonitoring web page.

this is when I execute this scripts. (provided by kafka)

./kafka-consumer-group.sh --describe --bootstrap-server localhost:9092 --group my-cosumer-group
TOPIC         PARTITION  CURRENT-OFFSET  LOG-END-OFFSET    LAG              CONSUMER-ID                                       HOST                           CLIENT-ID
TopicA           1          3789451         3789466         15         my-consumer-4f3d986a-95e3-47ac-8711-57d1b97bd889-StreamThread-46-consumer-54c3d477-1c25-4570-a912-5e9d5db8b3bc/xxx.xxx.xxx.xxx               my-consumer-4f3d986a-95e3-47ac-8711-57d1b97bd889-StreamThread-46-consumer

and this is displayed at website when I searched same topic, same partition.

Topic | Partition | CommittedOffset | LogEndOffset | Lag | Owner | Created | Last Seen
topicA | 1  | 12908249 | 19685529 | 6777280 |  my-consumer-c35771be-8941-4240-ab0f-60cd345b7df7-StreamThread-46-consumer/yyy.yyy.yyy.yyy | 19 hours ago | in 5 hours

Owner and it's ip address has changed, but offset monitor seemed not to detect it. I didn't restart my consumer at all and even I restart my consumer, offset monitor should have detected that. (I think)

Is it would be any clues that could this issue?

monitor processor died because of OOM. it might be related with this issue. I will inspect more about it more.

rcasey212 commented 6 years ago

Hi @dunamu-maru,

After thinking about this for a while, my best guess is that your consumers, while reading from the same topic, are using different consumer-groups and therefore report committed offsets independently from each other. I'm afraid, based on the information given here, that is the only possibility I have been able to come up with to help you in your circumstance and have been unable to find an issue with how this monitor works. Please re-open a new ticket with more information if you continue to see this behavior.