Open andrewthad opened 4 years ago
I'm still interesting in implementing a user-facing rd_kafka_wait_msg_queue
. @edenhill Would you accept a PR implementing this?
This is essentially what produce() does if providing RD_KAFKA_MSG_F_BLOCK, with the culprit that delivery reports need to be polled from another thread. Would that be usable in your setup?
Agreed, the example while
loop I gave is basically just rd_kafka_produce
with RD_KAFKA_MSG_F_BLOCK
. In languages without relocating garbage collectors (like C or golang), the two have identical behavior. However in GHC Haskell, getting to split up rd_kafka_produce
(nonblocking) and rd_kafka_wait_msg_queue
(blocking) would allow me to pass unpinned (relocatable) memory to rd_kafka_produce
. As things are now, I have to resort to pinned memory, which means that I have to make an extra copy of some of the fields before passing them over to librdkafka. I have to use pinned memory because calling rd_kafka_produce
with RD_KAFKA_MSG_F_BLOCK
requires performing a foreign function call in such a way that the GC might run while rd_kafka_produce
is running. When called without RD_KAFKA_MSG_F_BLOCK
, I may instead perform rd_kafka_produce
a foreign function call that prevents GC from running concurrently. (GHC confusingly refers to these two types of FFI calls as safe
and unsafe
for historical reasons.)
Feature Request Description
It would make librdkafka easy for me to incorporate into a library I'm working on if it provided a function for waiting on a message queue have space available:
The behavior of this function would be to block until
rk->rk_curr_msgs.size
becomes low enough thatlen
bytes are available on the queue. This is really just a hint since another producer thread could get scheduled to run right afterrd_kafka_wait_msg_queue
. That is, correct use would involve runningrd_kafka_wait_msg_queue
andrd_kafka_produce*
(with message flags for asynchronous behavior) in a loop. The implementation of this function could be almost entirely cribbed fromrd_kafka_curr_msgs_add
. With such a function, users could simulate the synchronous producer with something like:Motivation
For message production, users currently have the option of asynchronous production or synchronous production. In the applications I've been working on, the blocking producer tends to meet my needs a little better. However, now I'm running into a tricky situation. I've started incorporating the work from https://github.com/edenhill/librdkafka/pull/2902 into rdkafka-api, a Haskell library that provides bindings to librdkafka.
The GHC Haskell runtime includes a relocating stop-the-world garbage collector. The C foreign function interface (FFI) offers two ways to call C code (both named poorly). Here are their relevant behaviors:
unsafe
FFI: prevents GC. Consequently, C code is allowed to access GC-managed byte arrays. Intended for nonblocking C functions.safe
FFI: does not prevent GC. That is, GC might relocate byte arrays while the C function is running. C code may only access byte arrays that were explicitly pinned (never relocated by GC). Intended for blocking C functions.I cannot speak in any detail about other language runtimes. In GHC Haskell, this ends up meaning that it is preferable to use the
unsafe
ffi when possible since that means you can don't have to care about whether or not byte arrays had been explicitly pinned. Nonblocking functions are easy to work with since they require the caller to uphold fewer invariants.So, how does this relate to librdkafka? I'd like to be able to get the behavior that the blocking producer offers but without having to use the
safe
FFI to callrd_kafka_produce*
. Having the extra function to wait until space is probably available means that I could use the safe FFI just forrd_kafka_wait_msg_queue
.