apache / pulsar-client-go

Apache Pulsar Go Client Library
https://pulsar.apache.org/
Apache License 2.0
652 stars 335 forks source link

Deadlock in connection pool #1272

Closed Gilthoniel closed 2 weeks ago

Gilthoniel commented 1 month ago

Expected behavior

The client should properly handle a cluster that is not ready yet and retry until it gets healthy again.

Actual behavior

When the client is attempting to reconnect or to create new producers, consumers or readers but the service is not ready, it has a chance to block the connection pool.

Steps to reproduce

ConnectionClosed callbacks have a chance of blocking the connection pool because the GetConnection of the pool may close a connection when the state has changed, which happens when the cluster is not ready. In our case, we were observing a lot of closing because right after getting a connection to the broker, it was closing due to ServiceNotReady since too many bookies were down.

System configuration

Pulsar version: 3.0.5 Pulsar client: 13.1

nodece commented 3 weeks ago

Could we reproduce this issue?

Gilthoniel commented 3 weeks ago

Hardly in a simple integration test as it requires a healthy client against a failing cluster.

I can give more details however as it happened recently again. It deadlocked again because we received plenty of

Broker notification of Closed producer: 7

which is calling ConnectionClosed and filling up the channel.