dpkp / kafka-python

Python client for Apache Kafka
http://kafka-python.readthedocs.io/
Apache License 2.0
5.58k stars 1.4k forks source link

producer raises FailedPayloadsError #362

Closed sunchen009 closed 9 years ago

sunchen009 commented 9 years ago

Most time kafka-python works well. But some exceptions throwed by kafka-python sometimes. My kakfa server version is 0.8.2

2015-04-01 03:02:36,056 - kafka - ERROR - Unable to receive data from Kafka
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/conn.py", line 102, in _read_bytes
    raise socket.error("Not enough data to read message -- did server kill socket?")
error: Not enough data to read message -- did server kill socket?
2015-04-01 03:02:36,056 - kafka - DEBUG - Closing socket connection for 172.31.13.68:9092
2015-04-01 03:02:36,057 - kafka - WARNING - Could not receive response to request [-------my data-------here] from server <KafkaConnection host=172.31.13.68 port=9092>: Kafka @ 172.31.13.68:9092 went away
2015-04-01 03:02:36,057 - kafka - ERROR - Unable to send messages
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/producer/base.py", line 199, in _send_messages
    timeout=self.ack_timeout)
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/client.py", line 417, in send_produce_request
    resps = self._send_broker_aware_request(payloads, encoder, decoder)
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/client.py", line 208, in _send_broker_aware_request
    raise FailedPayloadsError(failed_payloads)
FailedPayloadsError
2015-04-01 03:02:36,058 - api.apis.public.app - ERROR - Traceback (most recent call last):
  File "./api/apis/public/app.py", line 103, in on_post
    self.producer.send_messages('eyespage.applogs_json', *applogs_json)
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/producer/simple.py", line 77, in send_messages
    topic, partition, *msg
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/producer/base.py", line 173, in send_messages
    return self._send_messages(topic, partition, *msg)
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/producer/base.py", line 199, in _send_messages
    timeout=self.ack_timeout)
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/client.py", line 417, in send_produce_request
    resps = self._send_broker_aware_request(payloads, encoder, decoder)
  File "/usr/local/lib/python2.7/dist-packages/kafka_python-0.9.4_dev-py2.7.egg/kafka/client.py", line 208, in _send_broker_aware_request
    raise FailedPayloadsError(failed_payloads)
FailedPayloadsError
dpkp commented 9 years ago

Thanks for posting the logs. Can you post the server logs as well?

sunchen009 commented 9 years ago

My kafka server log level is INFO, I can't find any higher level log items with these exceptions in my kafka server logs.

dpkp commented 9 years ago

ok, thanks. it could just be flaky network connectivity. in any event, right now the SimpleProducer requires the user to try / except on exceptions like this and manage retries etc. I'm hopeful that #331 will address some of these issues for async producers in 0.9.4. haven't looked at sync producers closely. also you might take a glance at PR #333

alvinchow86 commented 9 years ago

I'm also occasionally seeing issues with the SimpleProducer giving errors and failing to send messages (the producer client being called within Django web servers). It's kind of intermittent and hard to reproduce consistently, but often enough to be concerning. I wonder if this is related?

[2015-04-16 14:46:26] [DEBUG] [kafka:126] About to send 152 bytes to Kafka, request 28
[2015-04-16 14:46:26] [DEBUG] [kafka:148] Reading response 28 from Kafka
[2015-04-16 14:46:26] [DEBUG] [kafka:84] About to read 4 bytes from Kafka
[2015-04-16 14:46:26] [ERROR] [kafka:102] Unable to receive data from Kafka
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/kafka/conn.py", line 99, in _read_bytes
    raise socket.error("Not enough data to read message -- did server kill socket?")
error: Not enough data to read message -- did server kill socket?
[2015-04-16 14:46:27] [DEBUG] [kafka:176] Closing socket connection for kafka1.com:9092
[2015-04-16 14:46:27] [WARNING] [kafka:186] Could not receive response to request [MY_DATA] from server <KafkaConnection host=kafka1.com port=9092>: Kafka @ kafka1.com:9092 went away
[2015-04-16 14:46:27] [ERROR] [kafka:200] Unable to send messages
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/kafka/producer/base.py", line 198, in _send_messages
    timeout=self.ack_timeout)
  File "/usr/local/lib/python2.7/dist-packages/kafka/client.py", line 411, in send_produce_request
    resps = self._send_broker_aware_request(payloads, encoder, decoder)
  File "/usr/local/lib/python2.7/dist-packages/kafka/client.py", line 202, in _send_broker_aware_request
    raise FailedPayloadsError(failed_payloads)
FailedPayloadsError: [ProduceRequest(topic='mytopic', partition=0, messages=[Message(magic=0, attributes=0, key=None, value='MY_DATA')])]

I'm using it with pretty default settings

kafka = KafkaClient(KAFKA_HOSTS)
producer = SimpleProducer(kafka)
...
producer.send_messages(..)

I'm holding a long-lived client/connection and using it to send Kafka events as needed, could there be an issue with this?

Any help would be greatly appreciated! I'm using kafka-python 0.9.3 and Kafka server 0.8.2.

dpkp commented 9 years ago

FailedPayloadsError can mean a simple network connection error, or it can mean that the server threw an exception and closed the socket. Check your server logs and look for anything out of the ordinary. If it is just a network issue, then you should be able to try/except on FailedPayloadsError and retry. Because you can't know whether it is an intermittent network error or an unhandled server failure, you probably should limit the number of retries.

Managing all of this yourself is not ideal. I've been trying to improve the internal error handling of the consumers (see KafkaConsumer), but unfortunately the producers are still not great. There are a few open issues for 0.9.4 that I'd like to get cleaned up and hopefully we can stamp this out asap!

dpkp commented 9 years ago

The async producer should now properly handle FailedPayloadsErrors and other exceptions. See #331, #366, and #388 . Use of the synchronous producer will still require you to handle exception-handling separately.