doujiang24 / lua-resty-kafka

Lua kafka client driver for the Openresty based on the cosocket API
BSD 3-Clause "New" or "Revised" License
802 stars 275 forks source link

What does this error mean: err: not found broker ? #94

Open jeremyjpj0916 opened 4 years ago

jeremyjpj0916 commented 4 years ago

I have not seen this error in any of my other environments:

2020/05/06 15:31:56 [error] 25#0: *59981903 [lua] producer.lua:272: 
buffered messages send to kafka err: not found broker, retryable: true, 
topic: sample_topic_test, partition_id: 36, length: 1, 
context: ngx.timer, client: 10.xxx.xx.xxx, server: 0.0.0.0:8443

Found it here: https://github.com/doujiang24/lua-resty-kafka/blob/master/lib/resty/kafka/client.lua#L221

My config is like:

 "config": {
    "bootstrap_servers": [
      "server1.com:9093",
      "server2.com:9093",
      "server3.com:9093",
      "server4.com:9093",
      "server5.com:9093",
      "server6.com:9093",
      "server7.com:9093",
      "server8.com:9093"
    ],
    "topic": "sample_topic_test",
    "timeout": 10000,
    "keepalive": 60000,
    "producer_async_flush_timeout": 1000,
    "ssl_verify": false,
    "producer_request_acks": 1,
    "producer_request_limits_bytes_per_request": 1048576,
    "producer_request_timeout": 2000,
    "ssl": true,
    "producer_async_buffering_limits_messages_in_memory": 10000,
    "producer_request_retries_max_attempts": 5,
    "producer_request_limits_messages_per_request": 50,
    "producer_request_retries_backoff_timeout": 100,
    "producer_async": true
  }

@doujiang24 @wanghuizzz any idea what this means?

jeremyjpj0916 commented 4 years ago

Digging further looks like the brokers themselves are supposed to return a partiton leader in a response tcp call: https://github.com/doujiang24/lua-resty-kafka/blob/8e3686ece91472438e2f3b29371d616ccd4f84c1/lib/resty/kafka/client.lua#L90

And maybe this cluster is failing to return such information? Is that the likely root cause?

I see other git issues in this repo mentioning this error but saying it only happens on kafka cluster restart, my experience is that this happens right from the beginning and never stops erring until I disable this kafka logging library against a specific cluster in our network.

This library works fine against 3/4 kafka clusters we have internally and they claim that each was produced and maintained by ansible and should have the same properties. However I think this error is telling me something is wrong with that one cluster or maybe wrong with 1 of the 8 nodes of that cluster, I think we could log that meta table response data from brokers in output in ngx debug mode would be helpful. Was thinking to PR that potentially if you want it. Just want to understand the error better and why some cluster works and 1 does not.

jeremyjpj0916 commented 4 years ago

Update. So I tried to rolling redeploy the app too, then it magically starts working.

Also in my logic I have:

  local ok, err = producer:send(conf.topic, nil, cjson_encode(message))
  if not ok then
    kong.log.err("[kong-kafka-log] failed to send a message on topic ", conf.topic, ": ", err)
    return
  end
end

But in ASYNC mode where its flushing and printing the library message, the err message here never prints because I never go inside the if not ok block to print error from outside the library :/ . Maybe I try if err then rather than if not okay then? Seems this library has some bugs to work out.

doujiang24 commented 4 years ago

@jeremyjpj0916 you can log the meta response below since it can be reproduced in your ENV. https://github.com/doujiang24/lua-resty-kafka/blob/8e3686ece91472438e2f3b29371d616ccd4f84c1/lib/resty/kafka/client.lua#L143

and we can add the debug level log in this library, PR welcome, thanks :)