Open maxtsu opened 8 months ago
Experimenting with the closing the consumer, what I see missing in the debug logs for a hung consumer.close() pasted below. When the problem happens the Internal main thread termination does not complete.
%7|1698511119.679|TERMINATE|rdkafka#consumer-1| [thrd:main]: Internal main thread termination done %7|1698511119.679|TERMINATE|rdkafka#consumer-1| [thrd:app]: Destroying op queues %7|1698511119.679|TERMINATE|rdkafka#consumer-1| [thrd:app]: Destroying cgrp %7|1698511119.679|MEMBERID|rdkafka#consumer-1| [thrd:app]: Group "device-group-Cisco-hardware-ingest1": updating member id "" -> "(not-set)" %7|1698511119.679|TERMINATE|rdkafka#consumer-1| [thrd:app]: Termination done: freeing resources 2023-10-28 16:38:39,679 CRITICAL Program terminated for restart
Was there any progress resolving this issue?
My temp solution is to call consumer.close()
from a python Thread
with daemon=true
and have a timeout on the join
call.
Was there any progress resolving this issue?
My temp solution is to call
consumer.close()
from a pythonThread
withdaemon=true
and have a timeout on thejoin
call.
Quick answer, no. It was discovered that the kafka broker cluster has stability issues and was crashing, this participated the termination above which would never complete. The kafka cluster issue was resolved, and this python consumer issue did not reoccur. But that is not the resolution, I know the issue can return under those circumstances. Therefore porting the code to golnag at present.
Description
Issue with termination of kafka consumer. When consumer has being cunsuming messages for over a number of hours, it will fail to fully terminate and hang. consumer.close() is issued, and the process starts, but fails to fully complete, and hangs forever. Even signal.alarm does not terminate the script. Python script running in Alpine container.
librdkafka version 2.2.0 python:3.9-alpine