Open blindspotbounty opened 1 month ago
Additionally have the following log of refcounts:
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 0 +1: 0x7f11d0018df8: rd_kafka_broker_add:4978
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 1 +1: 0x7f11d0018df8: rd_kafka_broker_add:5046
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 2 +1: 0x7f11d0018df8: rd_kafka_broker_monitor_add:6091
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 3 +1: 0x7f11d0018df8: rd_kafka_broker_update:5593
REFCNT DEBUG: &(rkb)->rkb_refcnt 4 -1: 0x7f11d0018df8: rd_kafka_broker_update:5618
REFCNT DEBUG: &(rkb)->rkb_refcnt 3 +1: 0x7f11d0018df8: rd_kafka_broker_update:5580
REFCNT DEBUG: &(rkb)->rkb_refcnt 4 -1: 0x7f11d0018df8: rd_kafka_broker_update:5618
REFCNT DEBUG: &(rkb)->rkb_refcnt 3 +1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1315
REFCNT DEBUG: &(rkb)->rkb_refcnt 4 +1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1315
REFCNT DEBUG: &(rkb)->rkb_refcnt 5 +1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1315
REFCNT DEBUG: &(rkb)->rkb_refcnt 6 +1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1315
REFCNT DEBUG: (&(leader)->rkb_refcnt) 7 +1: 0x7f11d0018df8: rd_kafka_toppar_leader_update:744
REFCNT DEBUG: (&(new_rkb)->rkb_refcnt) 8 +1: 0x7f11d0018df8: rd_kafka_toppar_broker_migrate:1045
REFCNT DEBUG: &(leader)->rkb_refcnt 9 -1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1405
REFCNT DEBUG: (&(leader)->rkb_refcnt) 8 +1: 0x7f11d0018df8: rd_kafka_toppar_leader_update:744
REFCNT DEBUG: (&(new_rkb)->rkb_refcnt) 9 +1: 0x7f11d0018df8: rd_kafka_toppar_broker_migrate:1045
REFCNT DEBUG: &(leader)->rkb_refcnt 10 -1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1405
REFCNT DEBUG: (&(leader)->rkb_refcnt) 9 +1: 0x7f11d0018df8: rd_kafka_toppar_leader_update:744
REFCNT DEBUG: (&(new_rkb)->rkb_refcnt) 10 +1: 0x7f11d0018df8: rd_kafka_toppar_broker_migrate:1045
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 11 +1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3350
REFCNT DEBUG: &(rktp->rktp_next_broker)->rkb_refcnt 12 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3367
REFCNT DEBUG: &(leader)->rkb_refcnt 11 -1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1405
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 11 +1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3350
REFCNT DEBUG: &(rktp->rktp_next_broker)->rkb_refcnt 11 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3367
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 10 +1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3350
REFCNT DEBUG: &(rktp->rktp_next_broker)->rkb_refcnt 11 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3367
REFCNT DEBUG: (&(leader)->rkb_refcnt) 10 +1: 0x7f11d0018df8: rd_kafka_toppar_leader_update:744
REFCNT DEBUG: (&(new_rkb)->rkb_refcnt) 11 +1: 0x7f11d0018df8: rd_kafka_toppar_broker_migrate:1045
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 12 +1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3350
REFCNT DEBUG: &(rktp->rktp_next_broker)->rkb_refcnt 13 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3367
REFCNT DEBUG: &(leader)->rkb_refcnt 12 -1: 0x7f11d0018df8: rd_kafka_topic_metadata_update:1405
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 11 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 12 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 13 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 14 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 15 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 16 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 17 +1: 0x7f11d0018df8: rd_kafka_broker_trigger_monitors:6055
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 18 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rko->rko_u.broker_monitor.rkb)->rkb_refcnt 18 -1: 0x7f11d0018df8: rd_kafka_op_destroy:473
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 16 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 15 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 16 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 15 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 15 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 15 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 16 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 15 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 14 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 15 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 14 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 13 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 14 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 13 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 12 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 13 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 12 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 11 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 12 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 13 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 12 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 13 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 12 +1: 0x7f11d0018df8: rd_kafka_broker_weighted:1482
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 13 +1: 0x7f11d0018df8: rd_kafka_buf_new_request0:162
REFCNT DEBUG: &(rkb)->rkb_refcnt 14 -1: 0x7f11d0018df8: rd_kafka_metadata_refresh_topics:1423
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 13 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 14 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 13 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: (&(rkbuf->rkbuf_rkb)->rkb_refcnt) 12 +1: 0x7f11d0018df8: rd_kafka_req_response:1945
REFCNT DEBUG: &(rkb)->rkb_refcnt 13 +1: 0x7f11d0018df8: rd_kafka_broker_update:5580
REFCNT DEBUG: &(rkb)->rkb_refcnt 14 -1: 0x7f11d0018df8: rd_kafka_broker_update:5618
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 13 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rkbuf->rkbuf_rkb)->rkb_refcnt 12 -1: 0x7f11d0018df8: rd_kafka_buf_destroy_final:78
REFCNT DEBUG: &(rktp->rktp_broker)->rkb_refcnt 11 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3443
REFCNT DEBUG: &(rktp->rktp_broker)->rkb_refcnt 10 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3443
REFCNT DEBUG: &(rktp->rktp_broker)->rkb_refcnt 9 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3443
REFCNT DEBUG: &(rktp->rktp_broker)->rkb_refcnt 8 -1: 0x7f11d0018df8: rd_kafka_broker_op_serve:3443
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 7 +1: 0x7f11d0018df8: rd_kafka_broker_weighted:1482
REFCNT DEBUG: &(good)->rkb_refcnt 8 -1: 0x7f11d0018df8: rd_kafka_broker_weighted:1481
REFCNT DEBUG: &(rkb)->rkb_refcnt 7 +1: 0x7f11d0018df8: rd_kafka_broker_update:5580
REFCNT DEBUG: &(rkb)->rkb_refcnt 8 -1: 0x7f11d0018df8: rd_kafka_broker_update:5618
REFCNT DEBUG: &(rkb)->rkb_refcnt 7 -1: 0x7f11d0018df8: rd_kafka_destroy_internal:1225
REFCNT DEBUG: (&(rkb)->rkb_refcnt) 6 +1: 0x7f11d0018df8: rd_kafka_broker_trigger_monitors:6055
REFCNT DEBUG: &(rko->rko_u.broker_monitor.rkb)->rkb_refcnt 7 -1: 0x7f11d0018df8: rd_kafka_op_destroy:473
REFCNT DEBUG: &(rkb)->rkb_refcnt 6 -1: 0x7f11d0018df8: rd_kafka_broker_monitor_del:6122
But I am not very good aware of librdkafka internals to understand which refcounts were not decremented. Could someone help me check that, please?
Read the FAQ first: https://github.com/confluentinc/librdkafka/wiki/FAQ
Do NOT create issues for questions, use the discussion forum: https://github.com/confluentinc/librdkafka/discussions
Description
I was investigating hang on destroy that specifically reproduces on x86 platform (cannot reproduce with Mac OS/Linux arch64/M2).
How to reproduce
But on x86, it is good reproducible with the following scenario:
Better reproducible with rebalance enabled and assign called from different thread, i.e.:
The only thing that I've found that is pretty good reproducible on x86 platform is the following race (which I don't see on arm):
It is better reproducible with v2.3.0 and less with v2.5.0. With 2.3.0 I can reproduce it without external rebalance and with 2.5.0 with custom rebalance it is much better reproducible.
Reproducing this scenario pretty good with swift wrapper. There is a code in swift that can catch this race:
As the only difference between platforms is this race catched by TSan, I suspect that this might be a problem for future client destroy method.
Remark: I was trying to use kafka_destroy and kafka_destroy_flags with RD_KAFKA_DESTROY_F_NO_CONSUMER_CLOSE but works the same.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
debug=..
as necessary) from librdkafka Last logs before hang isWhile rebalance assign(NULL) is called further, it is ignored by librdkafka and seems it is the reason for hanging