SahilKang / cl-rdkafka

Common Lisp library for Kafka
Other
42 stars 7 forks source link

[#58] Fix +address->queue-lock+ deadlock #60

Closed SahilKang closed 4 years ago

SahilKang commented 4 years ago

This deadlock occurred with the consumer's commit method and producer's send method for the same reason: calling lparallel.queue:pop-queue[1][2] on an empty queue will block indefinitely. For both consumer's and producer's, these queues are filled by a call to enqueue-payload. However, because the calls to lparallel.queue:pop-queue and enqueue-payload each attempt to acquire the +address->queue-lock+ mutex, once the lparallel.queue:pop-queue blocks on an empty queue, we'd end up in a deadlock.

Ending up with an empty queue during a call to lparallel.queue:pop-queue is not a valid state and the reason why we'd reach this state is because the mutex was not being acquired early enough by the consumer and producer:

For each consumer and producer, there are two queues that are supposed to have the same size and corresponding elements at all times:

Everytime librdkafka enqueues a commit or send event to rd-kafka-queue, a corresponding lparallel promise should be enqueued to queue.

process-events, which is called by poll-loop in a background thread after acquiring +address->queue-lock+, will loop over rd-kafka-queue until it's empty and for each commit/send event that it pops off, will call process-commit-event/process-send-event. In turn, process-commit-event and process-send-event will pop a promise off of queue and fulfill it accordingly with the commit/send event details.

Because cl-rdkafka/ll:rd-kafka-commit-queue and %send were being called without acquiring the mutex, commit/send events continued to be enqueued onto rd-kafka-queue. This caused process-events to continue looping, which caused process-commit-event and process-send-event to continue popping promises off of queue. However, because enqueue-payload would attempt to acquire the mutex held by poll-loop before enqueuing promises onto queue, this queue would eventually become empty; thus, causing lparallel.queue:pop-queue to block indefinitely and leaving us in a deadlock.

Signed-off-by: Sahil Kang sahil.kang@asilaycomputing.com