confluentinc / confluent-kafka-dotnet

Confluent's Apache Kafka .NET client
https://github.com/confluentinc/confluent-kafka-dotnet/wiki
Apache License 2.0
88 stars 868 forks source link

Confluent Cloud example dotnetcore client fails to connect #1361

Closed sevak-mnatsakanyan closed 4 years ago

sevak-mnatsakanyan commented 4 years ago

Description

Hi,

I am trying to use Example (https://github.com/confluentinc/confluent-kafka-dotnet/tree/master/examples/ConfluentCloud) with my confluent cloud cluster, but it fails to connect with "Unknown error" message.

BootstrapServers = "pkc-lq8gm.westeurope.azure.confluent.cloud:9092",
SaslMechanism = SaslMechanism.Plain,
SecurityProtocol = SecurityProtocol.SaslSsl,
SslCaLocation = "cacert.pem"

Console error log:

%3|1595856799.036|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2098ms in state CONNECT)
%3|1595856799.036|ERROR|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: 1/1 brokers are down
%3|1595856799.036|ERROR|rdkafka#producer-1| [thrd:app]: rdkafka#producer-1: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2098ms in state CONNECT)
%3|1595856801.987|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2050ms in state CONNECT, 1 identical error(s) suppressed)
%3|1595856801.988|ERROR|rdkafka#producer-1| [thrd:app]: rdkafka#producer-1: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2050ms in state CONNECT, 1 identical error(s) suppressed)
%3|1595856809.198|FAIL|rdkafka#consumer-2| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2161ms in state CONNECT)
%3|1595856809.198|ERROR|rdkafka#consumer-2| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: 1/1 brokers are down
%3|1595856809.198|ERROR|rdkafka#consumer-2| [thrd:app]: rdkafka#consumer-2: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2161ms in state CONNECT)
%3|1595856812.084|FAIL|rdkafka#consumer-2| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2049ms in state CONNECT, 1 identical error(s) suppressed)
%3|1595856812.084|ERROR|rdkafka#consumer-2| [thrd:app]: rdkafka#consumer-2: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2049ms in state CONNECT, 1 identical error(s) suppressed)

Full debug all enabled:

%7|1595856893.604|BROKER|rdkafka#producer-1| [thrd:app]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Added new broker with NodeId -1
%7|1595856893.605|CONNECT|rdkafka#producer-1| [thrd:app]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Selected for cluster connection: bootstrap servers added (broker has 0 connection attempt(s))
%7|1595856893.605|BRKMAIN|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Enter main broker thread
%7|1595856893.605|INIT|rdkafka#producer-1| [thrd:app]: librdkafka v1.5.0 (0x10500ff) rdkafka#producer-1 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_gssapi,sasl_plain,sasl_scram,plugins,zstd,sasl_oauthbearer, SSL ZLIB SNAPPY SASL_SCRAM PLUGINS HDRHISTOGRAM, debug 0xfffff)
%7|1595856893.605|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Received CONNECT op
%7|1595856893.605|STATE|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Broker changed state INIT -> TRY_CONNECT
%7|1595856893.605|BROADCAST|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: Broadcasting state change
%7|1595856893.605|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: broker in state TRY_CONNECT connecting
%7|1595856893.605|STATE|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Broker changed state TRY_CONNECT -> CONNECT
%7|1595856893.605|BROADCAST|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: Broadcasting state change
%7|1595856893.632|TOPIC|rdkafka#producer-1| [thrd:app]: New local topic: test_topic
%7|1595856893.632|TOPPARNEW|rdkafka#producer-1| [thrd:app]: NEW test_topic [-1] 00000217A6B668B0 (at rd_kafka_topic_new0:441)
%7|1595856893.677|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connecting to ipv4#51.105.121.2:9092 (sasl_ssl) with socket 1156
%7|1595856894.606|NOINFO|rdkafka#producer-1| [thrd:main]: Topic test_topic metadata information unknown
%7|1595856894.606|NOINFO|rdkafka#producer-1| [thrd:main]: Topic test_topic partition count is zero: should refresh metadata
%7|1595856894.607|CONNECT|rdkafka#producer-1| [thrd:main]: Cluster connection already in progress: refresh unavailable topics
%7|1595856894.608|METADATA|rdkafka#producer-1| [thrd:main]: Skipping metadata refresh of 1 topic(s): no usable brokers
%7|1595856894.609|CONNECT|rdkafka#producer-1| [thrd:main]: Not selecting any broker for cluster connection: still suppressed for 48ms: no cluster connection
%7|1595856895.610|NOINFO|rdkafka#producer-1| [thrd:main]: Topic test_topic metadata information unknown
%7|1595856895.611|NOINFO|rdkafka#producer-1| [thrd:main]: Topic test_topic partition count is zero: should refresh metadata
%7|1595856895.612|CONNECT|rdkafka#producer-1| [thrd:main]: Cluster connection already in progress: refresh unavailable topics
%7|1595856895.612|METADATA|rdkafka#producer-1| [thrd:main]: Skipping metadata refresh of 1 topic(s): no usable brokers
%7|1595856895.613|CONNECT|rdkafka#producer-1| [thrd:main]: Not selecting any broker for cluster connection: still suppressed for 49ms: no cluster connection
%7|1595856895.803|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2196ms in state CONNECT) (_TRANSPORT)
%3|1595856895.803|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2196ms in state CONNECT)
%7|1595856895.804|STATE|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Broker changed state CONNECT -> DOWN
%3|1595856895.804|ERROR|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: 1/1 brokers are down
%7|1595856895.804|BROADCAST|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: Broadcasting state change
%7|1595856895.805|BUFQ|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Purging bufq with 0 buffers
%7|1595856895.805|BUFQ|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Purging bufq with 0 buffers
%7|1595856895.805|BUFQ|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Updating 0 buffers on connection reset
%7|1595856895.805|STATE|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Broker changed state DOWN -> INIT
%7|1595856895.806|BROADCAST|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: Broadcasting state change
%3|1595856895.804|ERROR|rdkafka#producer-1| [thrd:app]: rdkafka#producer-1: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2196ms in state CONNECT)

How to reproduce

  1. Create CCloud cluster, topic and add API key
  2. Clone the example
  3. Fill in topic name, username and password
  4. Run and the errors will appear

Checklist

Please provide the following information:

mhowlett commented 4 years ago

The error isn't very helpful...

%7|1595856895.803|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot]: sasl_ssl://pkc-lq8gm.westeurope.azure.confluent.cloud:9092/bootstrap: Connect to ipv4#51.105.121.2:9092 failed: Unknown error (after 2196ms in state CONNECT) (_TRANSPORT)

have you tried without specifying cacert.pem? (librdkafka will auto-discover on the system in common places in v1.5). have you tried from different systems win/linux? have you tried different versions 1.4.4 vs 1.5.0? i'm not sure why these would help, just trying to narrow down the problem. this hasn't come up before.

the 'boot' suffix on pkc-lq8gm.westeurope.azure.confluent.cloud:9092/boot seems unfamiliar, that would usually be 'bootstrap', would need to look into that further.

sevak-mnatsakanyan commented 4 years ago

Hi again,

Nugets: 1.4.4 and 1.5.0 were behaving the same way OS: On linux application was logging "Connection was refused" instead of on Windows message "Unknown issue". Logs: "boot" part was just a log shortening from Bootstrap

So managed to locate the issue, something was wrong with current internet connection. There was an issue with Socket on that particular internet link. Other connections worked fine.

Thank you for support