IBM / sarama

Sarama is a Go library for Apache Kafka.
MIT License
11.5k stars 1.75k forks source link

List of brokers not refreshed from zookeeper #1150

Closed mdubbyap closed 5 years ago

mdubbyap commented 6 years ago
Versions

Sarama Version: 1.17 Kafka Version: 0.9.0.1 Go Version: 1.10.3

Configuration

c.Net.MaxOpenRequests = 5 c.Net.DialTimeout = 30 time.Second c.Net.ReadTimeout = 30 time.Second c.Net.WriteTimeout = 30 * time.Second c.Net.SASL.Handshake = true

c.Metadata.Retry.Max = 3 c.Metadata.Retry.Backoff = 250 time.Millisecond c.Metadata.RefreshFrequency = 10 time.Minute c.Metadata.Full = true

c.Producer.MaxMessageBytes = 1000000 c.Producer.RequiredAcks = WaitForLocal c.Producer.Timeout = 10 time.Second c.Producer.Partitioner = NewHashPartitioner c.Producer.Retry.Max = 100 c.Producer.Retry.Backoff = 100 time.Millisecond c.Producer.Return.Errors = true c.Producer.Return.Successes = true c.Producer.CompressionLevel = CompressionLevelDefault

c.Consumer.Fetch.Min = 1 c.Consumer.Fetch.Default = 1024 1024 c.Consumer.Retry.Backoff = 2 time.Second c.Consumer.MaxWaitTime = 250 time.Millisecond c.Consumer.MaxProcessingTime = 100 time.Millisecond c.Consumer.Return.Errors = false c.Consumer.Offsets.CommitInterval = 1 * time.Second c.Consumer.Offsets.Initial = OffsetNewest

c.ClientID = "the zk GUID" c.ChannelBufferSize = 256 c.Version = MinVersion c.MetricRegistry = metrics.NewRegistry()

Logs

I can provide several more logs, unfortunately i do not have the start of the problem, but here is one such file. 80mb log file at: https://s3-us-west-2.amazonaws.com/public-build-artifacts--signalfuse-com/sarama.log

snippet:

{"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"kafka: error while consuming mts_creations/3: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers deregistered broker #-1 at kafka-nexus-rc5--ccaa.int.signalfuse.com:9092","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [throttle_state] from broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [mts_deletes] from broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"Failed to connect to broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092: dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [throttle_state] from broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers deregistered broker #-1 at kafka-nexus-rc4--ccab.int.signalfuse.com:9092","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [mts_deletes] from broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [throttle_state] from broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"Failed to connect to broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092: dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata no available broker to send metadata request to","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers resurrecting 2 dead seed brokers","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata retrying after 250ms... (2 attempts remaining)\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers deregistered broker #-1 at kafka-nexus-rc5--ccaa.int.signalfuse.com:9092","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [mts_deletes] from broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers deregistered broker #-1 at kafka-nexus-rc5--ccaa.int.signalfuse.com:9092","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [throttle_state] from broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"Failed to connect to broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092: dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [mts_deletes] from broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers deregistered broker #-1 at kafka-nexus-rc4--ccab.int.signalfuse.com:9092","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [throttle_state] from broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"Failed to connect to broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092: dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata no available broker to send metadata request to","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers resurrecting 2 dead seed brokers","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata retrying after 250ms... (2 attempts remaining)\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers deregistered broker #-1 at kafka-nexus-rc5--ccaa.int.signalfuse.com:9092","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [throttle_state] from broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"Failed to connect to broker kafka-nexus-rc4--ccab.int.signalfuse.com:9092: dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc4--ccab.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [throttle_state] from broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/metadata fetching metadata for [mts_deletes] from broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"Failed to connect to broker kafka-nexus-rc5--ccaa.int.signalfuse.com:9092: dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host\n","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata got error from broker while fetching metadata:dial tcp: lookup kafka-nexus-rc5--ccaa.int.signalfuse.com on 169.254.169.253:53: no such host","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:31","instance":"sarama-logger","msg":"client/metadata no available broker to send metadata request to","time":"2018-08-10T21:46:06Z"} {"caller":"stdlogger.go:24","instance":"sarama-logger","msg":"client/brokers resurrecting 2 dead seed brokers","time":"2018-08-10T21:46:06Z"}

Problem Description

We replaced a two node cluster with two new nodes. (rc4 and rc5 were replaced with rc6 and rc8). They were not replaced in a graceful way. 4 and 5's disks filled so we replaced them with 6 and 8. Even after the cluster was up and running and zookeeper reflected this state, this client never recovered without being restarted even after several days.

I would have expected a refresh from zk at some point. Thoughts?

varun06 commented 5 years ago

Is this still an issue. If not, please close.

d1egoaz commented 5 years ago

closing, feel free to re-open if this is still an issue