dpkp / kafka-python

Python client for Apache Kafka
http://kafka-python.readthedocs.io/
Apache License 2.0
5.6k stars 1.4k forks source link

Is there any minimal limit for consumer_timeout_ms? #1702

Open WuBingzheng opened 5 years ago

WuBingzheng commented 5 years ago

version: kafka-1.3.5

KafkaConsumer does not return any message if consumer_timeout_ms set exactly equal or less that 50. It works fine if consumer_timeout_ms set bigger than 50.

Is there any minimal limit for consumer_timeout_ms?

My topic is very busy. The code is simpe:

#!/usr/bin/env python

FLUSH_DELAY = 50 # in ms

import time
from kafka import KafkaConsumer

consumer = KafkaConsumer('my_topic',
               bootstrap_servers=['prod-kafka-1:9092', 'prod-kafka-2:9092']
               consumer_timeout_ms=FLUSH_DELAY)

while True:
    print "wait:", time.time()
    for message in consumer:
        print "message:", message.value

Output:

wait 1548042155.78
wait 1548042155.83
wait 1548042155.88
wait 1548042155.93
wait 1548042155.98
wait 1548042156.03
wait 1548042156.08
tvoinarovskyi commented 5 years ago

Setting a low consumer_timeout_ms only means that the system can't load a full batch fast enough. Why would you want that so low, do you have a use case for it? So that you know, it's not like KafkaConsumer is doing fetching in the background thread, every time you get something, it may go to the broker. It does have lazy loading for 1 page, but usually, you would be better of thinking that it goes to the broker every time you ask for a message.

WuBingzheng commented 5 years ago
  1. My use case is that, read message from kafka and store them into MySQL, in batch and real time. So I want to write to mysql if a) 100 messages has been read, or b) no message for 20ms.

  2. 50ms is not OK while 51ms is OK. So I do not think the problem is "not fast enough", but there must be some limit. And this limit should be in document.