apache / pulsar-client-go

Apache Pulsar Go Client Library
https://pulsar.apache.org/
Apache License 2.0
652 stars 335 forks source link

[Issue 1272][connection] Attempt to avoid deadlock during reconnection #1273

Closed Gilthoniel closed 2 weeks ago

Gilthoniel commented 1 month ago

Fixes #1272

Motivation

Producers and consumers register themselves to the connection to be notified when it gets closed. Even though the callback push the events in a channel, it can get stuck and the connection pool is locked which prevents any other caller to get a connection.

Modifications

This PR makes sure that the callback never blocks.

Verifying this change

(Please pick either of the following options)

This change is already covered by existing tests.

Does this pull request potentially affect one of the following parts:

none

nodece commented 3 weeks ago

We can use a goroutine to call the ConnectionClosed method in the connection pool.

What do you think of that?

Gilthoniel commented 3 weeks ago

It does not really change the issue that multiple reconnects will be attempted relentlessly until all callbacks have been processed. As mentioned in the issue, we can receive several dozens or hundreds of callback. It does not seem right to reconnect hundred of times then.

What is your concern with this fix ?

nodece commented 3 weeks ago

If the broker is not ready, the client will retry to connect to the broker. Once the broker is ready, the client can have an available connection to create the producer/consumer, when the producer/consumer is created, and then the connection pool listens to the producer/consumer.

I don't know why you can receive several dozens or hundreds of callback.

If your network environment is unstable? or if the broker is changing its status?

What is your concern with this fix ?

You are dropping the reconnect request.

Gilthoniel commented 3 weeks ago

It happens during a restart of everything so it's very unstable yes, brokers are probably changing their status.

I understand your worry but here it ensures that all reconnects will be handled and it only drops duplicate. Also in normal situation, as you said, you get only one reconnect.

lhotari commented 2 weeks ago

Great contributions @Gilthoniel! Thank you