Parsely / pykafka

Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.
http://pykafka.readthedocs.org/
Apache License 2.0
1.12k stars 232 forks source link

Suggest to document Partitioner Behavior in more details in Producer #1009

Open Shadowsong27 opened 4 years ago

Shadowsong27 commented 4 years ago

PyKafka version: 2.8.0

I m not sure I m doing this correctly, but based on what I have read in docs and source code, to produce messages into a specific partition while having many consumers (so only one consumer can consume this set of messages), you not only need to provide a partition_key when calling produce, but also a HashingPartitionersince the default is RandomPartitioner.

I found this solution based on the above code and this docs string, took me a while,

:param partition_key: The key to use when deciding which partition to send this
            message to. This key is passed to the `partitioner`, which may or may not
            use it in deciding the partition. The default `RandomPartitioner` does not
            use this key, but the optional `HashingPartitioner` does.

If I am solving this problem using the intended approach, I would suggest to add the usage of HashingPartitioner into the docs string of produce too, since when I first started I assumed that providing partition_key is sufficient to achieve my use case.