dpkp / kafka-python

Python client for Apache Kafka
http://kafka-python.readthedocs.io/
Apache License 2.0
5.57k stars 1.4k forks source link

Support for encoding/decoding non-string messages #210

Closed jakekdodd closed 8 years ago

jakekdodd commented 9 years ago

I've been trying to figure out how to send Avro-encoded messages with kafka-python. As best as I can tell, this isn't possible, since the write_int_string method called by _encode_message checks to make sure that the message is of type String, and errors out if this is not the case.

I don't see any reason why the kafka-python Producer should require that the message be a string, since Kafka messages are just bytes. Why was this design decision made, and is this something that's easily changeable to support non-string messages?

Or, am I just missing something, and this is possible with the current release of kafka-python?

dpkp commented 9 years ago

That is not intended. Do you mind providing a short test case for your particular problem? inline comment here is fine.

also are you testing against the current master, or a previous release / commit? Initial thought is that this could be an unintended side-effect of PR #204 (merged recently). One fix would be to permit any message that can be string-ified via str() . Another would be to refactor the encoding system to use a Message object, which would need to expose len() and bytes() functions.

In either case the message just has to get encoded as a "Variable Length Primitive" per the kafka protocol: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ProtocolPrimitiveTypes

wizzat commented 9 years ago

I've replicated this issue.

On Aug 26, 2014, at 8:49, jakekdodd notifications@github.com wrote:

I've been trying to figure out how to send Avro-encoded messages with kafka-python. As best as I can tell, this isn't possible, since the write_int_string method called by _encode_message checks to make sure that the message is of type String, and errors out of this is not the case.

I don't see any reason why the kafka-python Producer should require that the message be a string, since Kafka messages are just bytes. Why was this design decision made, and is this something that's easily changeable to support non-string messages?

Or, am I just missing something, and this is possible with the current release of kafka-python?

— Reply to this email directly or view it on GitHub.

jakekdodd commented 9 years ago

Sure thing. Testing against version 0.9.1.


from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
from kafka.producer import SimpleProducer, KeyedProducer

kafka = KafkaClient("localhost:9092")

producer = SimpleProducer(kafka)
producer.send_messages("my-topic", 123)

The producer freaks out (TypeError: object of type 'int' has no len()) because the message (an int) is not a string. If you try a byte array instead, you get struct.error: argument for 's' must be a string. I guess this makes sense, since

return struct.pack('>i%ds' % len(s), len(s), s)

requires a string. It was just confusing for somebody who has used the Java Kafka API in the past, and expected that a byte array would work with a Producer message.

Accepting anything that's stringifyable may be a good solution.

dpkp commented 9 years ago

thanks -- discussing fix in IRC (#kafka-python on freenode.net). think this needs to be fixed before releasing 0.9.2

dpkp commented 9 years ago

Thinking about this a bit more. I think fixing this is actually a larger task that involves support for custom encoders / decoders. For now I think you will need to encode your avro objects to python strings before calling kafka.producer.send() . We should update the docs to clarify that the produce interface requires messages to be of type str -- meaning that byte encoding needs to happen before calling produce.send() .

Support for custom encoders / decoders will probably have to wait for a future release.

jakekdodd commented 9 years ago

As far as Avro objects go, agreed. Right now I'm converting them to strings and it's working well.

Updating the docs should be a good enough fix for now. Byte encoding the messages is simple enough for users to do on their own; the main difficulty was digging through source code to figure out whether this was a requirement.

I may get around to writing some Avro support for the Producer itself. If I do so, I'll let you guys know--serializing Kafka payloads with Avro is a common use case, and I suspect a lot of people would use the enhancement.

dpkp commented 9 years ago

211 makes the type requirement more explicit in the code, docstring, and README.

The next improvement here would be to implement something like the serializer.class producer config available in the java producer api; possibly also a deserializer on the consumer side similar to the keyDecoder and valueDecoder available in the high-level consumer api

malonej7 commented 9 years ago

Hi, late to the party here, but I've been looking all over the internet to find an example that can show me how to implement kafka-python using an avro encoding and I can't find anything amazingly. Would you be able to point me to any code samples for this? It would be greatly appreciated. Thanks.

jakekdodd commented 9 years ago

Heh blast from the past, I'd forgotten that I worked on this issue.

I was able to dig up this script and, miraculously, it worked. Here's a gist that should get you started:

https://gist.github.com/jakekdodd/e7ee38fd945818d86eb4

malonej7 commented 9 years ago

Nice thank you! That's very helpful. I appreciate it.

dpkp commented 8 years ago

KafkaProducer and KafkaConsumer classes support value_serializer and value_deserializer, respectively, which mean you can use any arbitrary function to convert messages to bytes and back.