aio-libs / aiokafka

asyncio client for kafka
http://aiokafka.readthedocs.io/
Apache License 2.0
1.17k stars 234 forks source link

kafka consumer never reconnects #876

Open cmflynn opened 1 year ago

cmflynn commented 1 year ago

Describe the bug to the best of my knowledge, the consumer is somehow dropping a connection to the kakfa brokers, and when it tries to reconnect it also seems to completely block the active event loop :(

heres how the logs look on the api:

2023-02-04 22:31:05,694 level=ERROR    [client._get_conn:460] Unable connect to node with id 4: 
2023-02-04 22:31:18,166 level=ERROR    [client._get_conn:460] Unable connect to node with id 1: 
2023-02-04 22:32:55,445 level=ERROR    [client._get_conn:460] Unable connect to node with id 5: 
2023-02-04 22:33:35,451 level=ERROR    [client._get_conn:460] Unable connect to node with id 6: 
2023-02-04 22:34:15,457 level=ERROR    [client._get_conn:460] Unable connect to node with id 4: 
2023-02-04 22:34:55,462 level=ERROR    [client._get_conn:460] Unable connect to node with id 3: 
2023-02-04 22:35:35,468 level=ERROR    [client._get_conn:460] Unable connect to node with id 2: 
2023-02-04 22:36:58,228 level=ERROR    [client._get_conn:460] Unable connect to node with id 2: 
2023-02-04 22:37:38,237 level=ERROR    [client._get_conn:460] Unable connect to node with id 1: 
2023-02-04 22:38:18,242 level=ERROR    [client._get_conn:460] Unable connect to node with id 6: 
2023-02-04 22:38:58,248 level=ERROR    [client._get_conn:460] Unable connect to node with id 4: 
2023-02-04 22:39:38,254 level=ERROR    [client._get_conn:460] Unable connect to node with id 3: 

Expected behaviour

The consumer should be able to recover and reconnect to the brokers, but it does not appear to be doing so.

Environment (please complete the following information):

Corfucinas commented 11 months ago

I'm having exactly the same problem

vmaurin commented 11 months ago

Do you have a minimal code to reproduce the error ? It could be network/DNS related, as the exchange between a kafka client and brokers are not as straight forward as other classical TCP/HTTP connection

Some common issue then could be that the DNS/IP returned as part of the cluster metadata response are getting obsolete or are not reachable

Corfucinas commented 11 months ago

Is there's any programatic way for us to check if it got disconnected and reconnect?