Open dgoldenberg-ias opened 3 weeks ago
Enabled logging enabled and increased ulimit
to 1024 locally.
What I'm seeing in the log:
INFO - <BrokerConnection node_id=bootstrap-0 host=srv3.us-east-1.amazonaws.com:9092 <connecting> [IPv4 ('XX.XX.X.XXX', 9092)]>: connecting to srv3.us-east-1.amazonaws.com:9092 [('XX.XX.X.XXX', 9092) IPv4]
INFO - Probing node bootstrap-0 broker version
INFO - <BrokerConnection node_id=bootstrap-0 host=srv3.us-east-1.amazonaws.com:9092 <connecting> [IPv4 ('XX.XX.X.XXX', 9092)]>: Connection complete.
ERROR - Error sending request data to <BrokerConnection node_id=bootstrap-0 host=srv3.us-east-1.amazonaws.com:9092 <connected> [IPv4 ('XX.XX.X.XXX', 9092)]>
Traceback (most recent call last):
File "/msk-proj/.venv310/lib/python3.10/site-packages/kafka/conn.py", line 998, in send_pending_requests
total_bytes = self._send_bytes_blocking(data)
File "/msk-proj/.venv310/lib/python3.10/site-packages/kafka/conn.py", line 601, in _send_bytes_blocking
sent_bytes = self._sock.send(data[total_sent:])
BrokenPipeError: [Errno 32] Broken pipe
It seems to me the issue comes down to this code:
if not self.connect_blocking(timeout_at - time.time()):
reset_override_configs()
raise Errors.NodeNotReadyError()
f = self.send(request)
# HACK: sleeping to wait for socket to send bytes
time.sleep(0.1)
# when broker receives an unrecognized request API
# it abruptly closes our socket.
# so we attempt to send a second request immediately
# that we believe it will definitely recognize (metadata)
# the attempt to write to a disconnected socket should
# immediately fail and allow us to infer that the prior
# request was unrecognized
mr = self.send(MetadataRequest[0](topics))
Even increasing this sleep to 1 or even 10 is not helping, in my local setup. Any ideas as to how to fix this or work around it?
Environment
kafka-python = "^2.0.2" python = Python 3.10.15
We're using AWS MSK - Kafka version 3.5.1.
Code
Observations
When I run this code locally, I get the below error. Curiously, when running it from a databricks notebook, I'm not getting the error. Python version there is 3.10.12.
Error stack trace
Specifying the Kafka API version explicitly
Specifying the Kafka API version explicitly:
Same error.
Network?
Seems OK.
Basic socket operations
Used the below code to verify sockets are working fine.
The exception occurs here: https://github.com/dpkp/kafka-python/blob/5bb126bf20bbb5baeb4e9afc48008dbe411631bc/kafka/conn.py#L1254 in the check_version method.
Any cause or workaround? Thanks.