Closed wucanhui closed 8 years ago
From the backtrace it seems that one of your topics were suddenly missing from the metadata returned from the broker, or that the partition count changed for a topic. Does that sound familiar?
librdkafka still shouldnt fail because of this and I'll try to reproduce it.
I have the exact same issue, the coredump stacks is the same. Is this issue still open? Any idea when it will be fixed?
Is this still happening on latest master?
I downloaded the source on 9 Apr. Didn't try the latest one? Should I try latest master?
Alan
From: Magnus Edenhill notifications@github.com<mailto:notifications@github.com> Reply-To: edenhill/librdkafka reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, May 9, 2016 at 4:23 PM To: edenhill/librdkafka librdkafka@noreply.github.com<mailto:librdkafka@noreply.github.com> Cc: Alan Yin ayin@threatmetrix.com<mailto:ayin@threatmetrix.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [edenhill/librdkafka] rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 (#558)
Is this still happening on latest master?
You are receiving this because you commented. Reply to this email directly or view it on GitHubhttps://github.com/edenhill/librdkafka/issues/558#issuecomment-217784806
Yes please, try latest master and if problem persists outline exact steps to reproduce. Thanks
This seems to be caused by connecting to two clusters simultaneously where the topic state differs. This is a problem on the user side but we still shouldn't crash.
This error is still happening with the latest version: librdkafka version: latest version kafka version: 0.10.0
My setup:
bin/kafka-topics.sh --zookeeper server1,server2,server3,server4,server5,server6 --describe --topic myTopic Topic:myTopic PartitionCount:3 ReplicationFactor:2 Configs: Topic: myTopic Partition: 0 Leader: 1 Replicas: 1,2 Isr: 2,1 Topic: myTopic Partition: 1 Leader: 3 Replicas: 3,4 Isr: 4,3 Topic: myTopic Partition: 2 Leader: 5 Replicas: 5,6 Isr: 5,6
My application is producing on partition 1 and consuming from partition 0 and 2.
Scenario:
Application log:
INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success INFO - DeliveryReportCallback: Ack of message delivery for (124 bytes): Success INFO - KafkaEventCallback: Received a log event: server4:9092/4: Receive failed: Disconnected(3, FAIL) ERROR - KafkaEventCallback: Received an error event: server4:9092/4: Receive failed: Disconnected(Local: Broker transport failure) INFO - KafkaEventCallback: Received a log event: Topic myTopic partition count changed from 3 to 1(5, PARTCNT) INFO - KafkaEventCallback: Received a log event: Topic myTopic: partition [1] is unknown (partition_cnt 1)(4, LEADER) * rdkafka_partition.c:207:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 * Aborted (core dumped)
GDB back trace:
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 cyrus-sasl-plain-2.1.23-15.el6_6.2.x86_64 db4-4.7.25-18.el6_4.x86_64 glibc-2.12-1.149.el6_6.7.x86_64 gperftools-libs-2.0-11.el6.3.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-37.el6_6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libcurl-7.19.7-40.el6_6.4.x86_64 libgcc-4.4.7-11.el6.x86_64 libidn-1.18-2.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libssh2-1.4.2-1.el6_6.1.x86_64 libstdc++-4.4.7-11.el6.x86_64 libunwind-1.1-2.el6.x86_64 libuuid-2.17.2-12.18.el6.x86_64 nspr-4.10.6-1.el6_5.x86_64 nss-3.16.2.3-3.el6_6.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64 nss-util-3.16.2.3-2.el6_6.x86_64 openssl-1.0.1e-30.el6.8.x86_64 xerces-c-3.0.1-20.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt
reason=reason@entry=0x7f4cc617f70a "assert: rktp->rktp_removed==0") at rdkafka.c:2395
Note that I am using the KafkaConsumer, not the legacy Consumer. I also do not have a custom rebalance callback.
Thanks.
It looks like this happens when different brokers report different partition counts for the same topic.
Just saw that this bug was part of 0.9.2 milestone. Do you have any date in mind for the release? Thanks.
Ive reproduced it and will fix it soon.
This is fixed on the partition_changes branch
This issue is now fixed on master, please try to verify the fix in your environment. Thank you.
We are using librdkafka-0.9.0.99, and it crash when keeping on consuming message for hours. The logs message: Consume failed: Local: Operation in progress Consume failed: Local: Operation in progress Consume failed: Local: Operation in progress Consume failed: Local: Operation in progress Consume failed: Local: Operation in progress * rdkafka_partition.c:165:rd_kafka_toppar_remove: assert: rktp->rktp_removed==0 * Aborted (core dumped)
The gdb info is like this: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff40195000 Core was generated by `./etld etl.properties'. Program terminated with signal 6, Aborted.
0 0x00007ff1e34630d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
0 0x00007ff1e34630d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007ff1e346683b in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007ff1e28402aa in rd_kafka_crash (file=, line=, function=, rk=0x0,
3 0x00007ff1e2870ad5 in rd_kafka_toppar_remove (rktp=0x7ff1bc003020) at rdkafka_partition.c:165
4 0x00007ff1e2870af2 in rd_kafka_toppar_destroy_final (rktp=0x7ff1bc003020) at rdkafka_partition.c:183
5 0x00007ff1e28514d8 in rd_kafka_topic_leader_update (rkb=0x7ff1d8002890, partition=1, rkt=0x7ff1bc0011d0,
6 rd_kafka_topic_metadata_update (rkb=0x7ff1b8003f40, mdt=0x7ff1b8001547) at rdkafka_topic.c:735
7 0x00007ff1e2860d31 in rd_kafka_parse_Metadata (rkb=0x7ff1b8003f40, rkt=0x7ff1bc0011d0, rkbuf=)
8 0x00007ff1e28692b8 in rd_kafka_op_handle_Metadata (rkb=0x7ff1b8003f40, err=RD_KAFKA_RESP_ERR_NO_ERROR, rkbuf=0x2325640,
9 0x00007ff1e2859959 in rd_kafka_buf_callback (rkb=0x7ff1b8003f40, err=RD_KAFKA_RESP_ERR_NO_ERROR, response=0x2325640,
10 0x00007ff1e2838609 in rd_kafka_toppar_q_cb (rk=, rko=, cb_type=,
11 0x00007ff1e285bd0f in rd_kafka_q_serve (rkq=0x7ff1d8001670, timeout_ms=, max_cnt=, cb_type=0,
12 0x00007ff1e2841dbd in rd_kafka_toppars_q_serve (timeout_ms=, rkq=0x7ff1d8001670) at rdkafka.c:892
13 rd_kafka_thread_main (arg=0x7ff1d80015e0) at rdkafka.c:929
14 0x00007ff1e28781ff in _thrd_wrapper_function (aArg=) at tinycthread.c:599
15 0x00007ff1e3d08e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
16 0x00007ff1e352038d in clone () from /lib/x86_64-linux-gnu/libc.so.6
17 0x0000000000000000 in ?? ()
Do you know why it crash like this? Thanks.