Open mrocklin opened 7 years ago
On the producer side is there any back pressure? There doesn't seem to be any block=
option when calling the produce(...)
method. Is there a way to emit a message in a robustly non-blocking way and err if blocking would be inevitable?
In general I'd like to avoid adding additional dependencies where possible, but it will be helpful to understand exactly what a Tornado dependency would enable. If there's a good tradeoff between tornado-specific code in pykafka and ease of use for users of both, it could be worth the added dependency. Without knowing what this will look like, though, I'm wary of adding even more to our already somewhat bloated test requirements.
Some work was started a while ago by @mikepk on a callback mechanism similar to the one you're describing but for produced messages in https://github.com/Parsely/pykafka/pull/506. If it seems necessary, we can either adapt that work or start from scratch on a callback interface that would meet your needs. I think when we chat tomorrow we'll have a better understanding of the specific requirements around this.
You might be noticing that there's no balanced_consumer.py
in the rdkafka
directory. You can still use rdkafka with balanced consumers through the use_rdkafka
kwarg on BalancedConsumer
.
I noticed that there is no librdkafka solution for balanced consumers
That's because those should go the way of the dodo bird. Much better to use Kafka's native Consumer Group API's which are supported by librdkafka
and also exposed by pykafka
as ManagedBalancedConsumers
. Pykafka's BalancedConsumer
was built before these native Kafka API's existed, and (IMHO based on our production usage) has fairly serious problems such as https://github.com/Parsely/pykafka/issues/354
To clarify, the callback that would solve this issue would be called here when a message becomes available on one or more partitions.
I would like to consume and produce messages using PyKafka from within a Tornado application. I am potentially willing to contribute code for this. I have a few questions.
None
we'll want to be triggered when data does arrive. Sometimes frameworks like this provide a mechanism for a callback function, which in our case could be setting anEvent
object. I'm particularly interested in the librdkafka consumer. As far as I can tell looking at the code there is no such mechanism exposed at the Python level, but I thought I would check. If forced we can always use a separate thread for this.Separately, I noticed that there is no librdkafka solution for balanced consumers. Is this a limitation of librdkafka or is it that pykafka has not wrapped this functionality yet? If the latter then is there any near-term plan to support this?