confluentinc / librdkafka

The Apache Kafka C/C++ library
Other
175 stars 3.14k forks source link

Remove old brokers #423

Closed ylgeeker closed 8 years ago

ylgeeker commented 8 years ago

i use zookeeper to manage kafka-cluster,and the client will watch the node of the kafka-cluster in zookeeper.

after a kafka-server was replaced by a new server, i found that :

  1. rdkafka found the new server ,and connected to it;
  2. rdkafka will write some log information ,like this " KAFKA-3-ERROR: rdkafka#producer-18 xx.xx.xx.xx:9092/bootstrap: Failed to connect to broker at xx.xx.xx.xx:9092: Connection timed out";;
  3. when it print that log information, the data will be lost;

but, the ip address "xx.xx.xx.xx:9092" is invalid, it is the repleaced host, is the old kakfa-server.

why?how should i do? please !!

edenhill commented 8 years ago

librdkafka currently holds on to each broker it has ever seen, which means it will still try to connect to an old replaced broker (old address).

This should be fixed: any learnt broker that has not been reported in the official broker list for either some time, or perhaps by a quorom of brokers, should be removed.

laxpio commented 8 years ago

i meet the same question,and i just update the rkb->rkb_nodename and rkb->rkb_name in rd_kafka_broker_update method when the broker replaced. i try test this.

laxpio commented 8 years ago

when init,rd_kafka_brokers_add with nodeid=-1, i find the nodeid not update always,and rd_kafka_broker_thread_main threads = 2* brokers. i think rd_kafka_broker_thread_main threads=brokers is better,and the nodeid=-1 will be updated to brokerid.

edenhill commented 8 years ago

It will only be able to migrate a bootstrap broker (-1) to a proper broker handle if the hostname and port matches exactly.

laxpio commented 8 years ago

i add show broker info debug in rd_kafka_broker_metadata_reply method,and then i get debug info as follow:

%7|1448371123.402|BROKER|sz_write#producer-0| 10.240.113.74:9092/bootstrap: [TEST-0]show broker info : 10.240.113.74:9092/bootstrap / -1 %7|1448371123.402|BROKER|sz_write#producer-0| 10.240.113.74:9092/bootstrap: [TEST-1]show broker info : 10.240.113.74:9092 / -1 %7|1448371123.576|BROKER|sz_write#producer-0| 10.240.113.74:9092/3: [TEST-0]show broker info : 10.240.113.74:9092/3 / 3 %7|1448371123.576|BROKER|sz_write#producer-0| 10.240.113.74:9092/3: [TEST-1]show broker info : 10.240.113.74:9092 / 3 %7|1448371133.411|BROKER|sz_write#producer-0| 10.240.113.74:9092/bootstrap: [TEST-0]show broker info : 10.240.113.74:9092/bootstrap / -1 %7|1448371133.411|BROKER|sz_write#producer-0| 10.240.113.74:9092/bootstrap: [TEST-1]show broker info : 10.240.113.74:9092 / -1 %7|1448371134.486|BROKER|sz_write#producer-0| 10.240.113.74:9092/3: [TEST-0]show broker info : 10.240.113.74:9092/3 / 3 %7|1448371134.486|BROKER|sz_write#producer-0| 10.240.113.74:9092/3: [TEST-1]show broker info : 10.240.113.74:9092 / 3

it does not migrate a bootstrap broker to a proper broker handle.

edenhill commented 8 years ago

What version is this on? latest master?

Can you try the same on the 0.9.0 branch?

laxpio commented 8 years ago

i use 0.8.6

edenhill commented 8 years ago

The functionality of migrating a broker handle from bootstrap to proper is only available on master branch Den 25 nov 2015 03:56 skrev "Chen" notifications@github.com:

i use 0.8.6

— Reply to this email directly or view it on GitHub https://github.com/edenhill/librdkafka/issues/423#issuecomment-159472462 .

laxpio commented 8 years ago

use master branch,it will be migrated a bootstrap broker to a proper broker handle. but when broker replaced,the hostname not update,the invalid hostname will keep sometime?

edenhill commented 8 years ago

librdkafka currently wont ever forget about a broker, so if a broker is decommissioned it will still try to connect to it infintely.

laxpio commented 8 years ago

the new broker,it will be connected?

laxpio commented 8 years ago

in log,server find the new broker,but not to connect.

edenhill commented 8 years ago

librdkafka will periodically poll broker metadata from connected brokers, that metadata includes a list of all brokers in the cluster. So if you add new brokers to an existing cluster and librdkafka is connected to at least one existing broker it will eventually learn of the new brokers.

laxpio commented 8 years ago

the phenomenon is a new kakfa-server replace a old kakfa-server user the same brokerID, after replaced,lidrdkafka can find the new kafka-server use new hostname,but librdkafka not do connect to the new hostname,when do kafka-preferred-replica-election.sh, it will more message deliver fail.

edenhill commented 8 years ago

Ah, yes, so this is fixed in master. The 0.8.6 code looks up on broker id first, if the broker id is already known it will not update the hostname: https://github.com/edenhill/librdkafka/blob/0.8.6/src/rdkafka_broker.c#L4427

In master it checks if the hostname changed and if so updates it: https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_broker.c#L4427

laxpio commented 8 years ago

hostname not update when use master branch. it also try connect the old broker hostname.

debug log: %7|1448543340.290|METADATA|sz_write#producer-1| 10.240.113.74:9092/3: Broker #1/2: 10.231.137.162:9092 NodeId 2 %7|1448543340.290|BROKER|sz_write#producer-1| 10.231.137.162:9092 NodeID 2 %7|1448543340.290|BROKER|sz_write#producer-1| old broker 10.240.113.72:9092,new broker 10.231.137.162:9092 ....... %7|1448543341.167|CONNECT|sz_write#producer-1| 10.240.113.72:9092/2: broker in state DOWN connecting %7|1448543341.169|CONNECT|sz_write#producer-1| 10.240.113.72:9092/2: couldn't connect to ipv4#10.240.113.72:9092: Connection refused ....... %7|1448543566.162|CONNECT|sz_write#producer-1| 10.240.113.72:9092/2: couldn't connect to ipv4#10.240.113.72:9092: Connection refused

edenhill commented 8 years ago

This should be fixed now.