EventStore / EventStoreDB-Client-Java

Official Asynchronous Java 8+ Client Library for EventStoreDB 20.6+
https://eventstore.com
Apache License 2.0
63 stars 20 forks source link

Handling connection shutdown #140

Closed andersflemmen closed 2 years ago

andersflemmen commented 2 years ago

Hi,

When shutting down one of our cluster nodes for maintenance, the client failed to discover a new node to connect to, which caused it to go into the shutdown state. As far as I can tell, this just happens behind the scenes, and you will not notice until you try to perform another operation with the client, which will then cause a ConnectionShutdownException to be thrown.

Any ideas on how to handle situations like this? Would be nice to be able to reconnect without having to kill the application or create a new client. Another option would be to have a callback that could allow the situation to be resolved immediately, without having to wait for someone to try to perform another operation using the client.

YoEight commented 2 years ago

Hi there,

The client is actually doing reconnection already. What I'm suspecting out of the little information you shared here is discovery happened a few times already, failed every time leading the client to just give up. A solution could be to increase your maxDiscoverAttempts and discoveryInterval to fit your business requirement.

andersflemmen commented 2 years ago

Thanks for the quick response!

So when you say reconnection, are you thinking of these?

Unable to find a node. Retrying... (2/3)
Unable to find a node. Retrying... (3/3)
Maximum discovery attempt count reached: 3
YoEight commented 2 years ago

Yes

andersflemmen commented 2 years ago

Increasing the number of attempts and the interval will solve this in most cases, but there is still a possibility that the client might end up "dead", and you won't notice until the next read or write. Feels like this could be handled cleaner by the client, but I guess this solution will do for now.

YoEight commented 2 years ago

To be honest, I fail to see how it can be resolved differently. I'm open to suggestions though. The nature of gRPC makes it difficult to know if the connection is down when the channel is not used. An ESDB client can be shared among several threads. If there were a callback of some sort that would notify you that the connection is closed (not down because internally, reconnections do happen), what would you do with such information?

andersflemmen commented 2 years ago

As far as I can tell, when the GrpcClient sets the shutdown flag, it will never be able to recover, meaning that the client instance can never be used again? With a callback, the application using the client could handle this as they please, whether that is to trigger a restart, create a new client, or something else.

YoEight commented 2 years ago

Personally, I think using a Let It Crash approach is better in this case. If you use an esdb client in different threads, creating a new client without restarting your application from scratch would be hard. A ConnectionShutdownException is a fatal error/ finalstate, hence why there is no means to recover from it.

andersflemmen commented 2 years ago

Yep, I agree! I keep repeating myself, but I still think the client should let you know it is dead before you try to use it the next time, which for example may cause an external HTTP request to fail. Guess we'll solve it using a periodic health check.