dpkp / kafka-python

Python client for Apache Kafka
http://kafka-python.readthedocs.io/
Apache License 2.0
5.61k stars 1.41k forks source link

Idle Producer Socket Disconnected Errors #2274

Open mubeta06 opened 2 years ago

mubeta06 commented 2 years ago

We are running python Kafka producer clients in a serverless environment (AWS Lambda python3.8 runtime) whereby (as per recommended best practices) we are sharing long-lived producer connections across invocations. We use a producer configuration similar to the following:

serializer = lambda m: json.dumps(m).encode('utf-8')
self._producer = kafka.KafkaProducer(
            'bootstrap_servers': os.environ['BOOTSTRAP_SERVERS'],
            'client_id': 'kafka-python-producer-optimus-prime',
            'security_protocol': 'SASL_SSL',
            'sasl_mechanism': 'PLAIN',
            'sasl_plain_username': os.environ['SASL_USERNAME'],
            'sasl_plain_password': os.environ['BROKER_ACCESS_KEY'],
            'socket_options': [(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1),
                                            (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)],
            acks='all',
            compression_type='gzip',
            retries=0,
            max_request_size=33554432, # 32MB
            value_serializer=serializer,
            key_serializer=serializer)

Everything is working as expected with the exception that every so often we see a number of errors similar to the following:

[ERROR] 2021-11-08T02:25:29.644Z dbd0e301.....-2f86be94e839 <BrokerConnection node_id=bootstrap-0 host=....cloud:9092 <connected> [IPv4 ('52.....31', 9092)]>: socket disconnected

The errors appear to be associated with producer connections that we have yet to utilise for sending messages (we conditionally send messages upon invocation). In other words we have instantiated the producer = kafka.KafkaProducer object but are yet to call producer.send(...). We see this error message approximately 10 minutes after instantiating the producer which aligns with the Kafka broker cluster connections.max.idle.ms configuration of 600000 (i.e. 10 minutes). Increasing the verbosity of logging does not seem to provide any further insight beyond the aforementioned error log message.

It was my understanding is that the IdleConnectionManager (https://github.com/dpkp/kafka-python/blob/f0a57a6a20a3049dc43fbf7ad9eab9635bd2c0b0/kafka/client_async.py#L974) was responsible for the client-side handling of the situation where broker connections were deemed idle to prevent such server-side disconnections from happening. Is my understanding correct here or is this an issue with the Kafka python client implementation?

mubeta06 commented 2 years ago

🤦 we are using version 2.0.2 of kafka-python

GRAWS commented 6 months ago

Hi Did you fix the issue? we are facing the same issue.