This deadlock occurred with the consumer's commit method and
producer's send method for the same reason: calling
lparallel.queue:pop-queue[1][2] on an empty queue will block
indefinitely. For both consumer's and producer's, these queues are
filled by a call to enqueue-payload. However, because the calls to
lparallel.queue:pop-queue and enqueue-payload each attempt to
acquire the +address->queue-lock+ mutex, once the
lparallel.queue:pop-queue blocks on an empty queue, we'd end up in a
deadlock.
Ending up with an empty queue during a call to
lparallel.queue:pop-queue is not a valid state and the reason why
we'd reach this state is because the mutex was not being acquired
early enough by the consumer and producer:
In the consumer's case, it should have been acquired before the
call to cl-rdkafka/ll:rd-kafka-commit-queue.
In the producer's case, it should have been acquired before the
call to %send.
For each consumer and producer, there are two queues that are supposed
to have the same size and corresponding elements at all times:
An rd-kafka-queue, which is a pointer to an rd_kafka_queue_t C
struct. This rd-kafka-queue is filled by calls to
cl-rdkafka/ll:rd-kafka-commit-queue and %send.
A queue, which is the result of lparallel.queue:make-queue.
This queue is filled by calls to enqueue-payload.
Everytime librdkafka enqueues a commit or send event to
rd-kafka-queue, a corresponding lparallel promise should be enqueued
to queue.
process-events, which is called by poll-loop in a background
thread after acquiring +address->queue-lock+, will loop over
rd-kafka-queue until it's empty and for each commit/send event that
it pops off, will call process-commit-event/process-send-event. In
turn, process-commit-event and process-send-event will pop a
promise off of queue and fulfill it accordingly with the commit/send
event details.
Because cl-rdkafka/ll:rd-kafka-commit-queue and %send were being
called without acquiring the mutex, commit/send events continued to be
enqueued onto rd-kafka-queue. This caused process-events to
continue looping, which caused process-commit-event and
process-send-event to continue popping promises off of
queue. However, because enqueue-payload would attempt to acquire
the mutex held by poll-loop before enqueuing promises onto queue,
this queue would eventually become empty; thus, causing
lparallel.queue:pop-queue to block indefinitely and leaving us in a
deadlock.
This deadlock occurred with the consumer's
commit
method and producer'ssend
method for the same reason: callinglparallel.queue:pop-queue
[1][2] on an empty queue will block indefinitely. For both consumer's and producer's, these queues are filled by a call toenqueue-payload
. However, because the calls tolparallel.queue:pop-queue
andenqueue-payload
each attempt to acquire the+address->queue-lock+
mutex, once thelparallel.queue:pop-queue
blocks on an empty queue, we'd end up in a deadlock.Ending up with an empty queue during a call to
lparallel.queue:pop-queue
is not a valid state and the reason why we'd reach this state is because the mutex was not being acquired early enough by the consumer and producer:In the consumer's case, it should have been acquired before the call to
cl-rdkafka/ll:rd-kafka-commit-queue
.In the producer's case, it should have been acquired before the call to
%send
.For each consumer and producer, there are two queues that are supposed to have the same size and corresponding elements at all times:
An
rd-kafka-queue
, which is a pointer to anrd_kafka_queue_t
C struct. Thisrd-kafka-queue
is filled by calls tocl-rdkafka/ll:rd-kafka-commit-queue
and%send
.A
queue
, which is the result oflparallel.queue:make-queue
. Thisqueue
is filled by calls toenqueue-payload
.Everytime librdkafka enqueues a commit or send event to
rd-kafka-queue
, a corresponding lparallel promise should be enqueued toqueue
.process-events, which is called by poll-loop in a background thread after acquiring
+address->queue-lock+
, will loop overrd-kafka-queue
until it's empty and for each commit/send event that it pops off, will call process-commit-event/process-send-event. In turn,process-commit-event
andprocess-send-event
will pop a promise off ofqueue
and fulfill it accordingly with the commit/send event details.Because
cl-rdkafka/ll:rd-kafka-commit-queue
and%send
were being called without acquiring the mutex, commit/send events continued to be enqueued ontord-kafka-queue
. This causedprocess-events
to continue looping, which causedprocess-commit-event
andprocess-send-event
to continue popping promises off ofqueue
. However, becauseenqueue-payload
would attempt to acquire the mutex held bypoll-loop
before enqueuing promises ontoqueue
, thisqueue
would eventually become empty; thus, causinglparallel.queue:pop-queue
to block indefinitely and leaving us in a deadlock.Signed-off-by: Sahil Kang sahil.kang@asilaycomputing.com