Closed andersflemmen closed 2 years ago
Hi there,
The client is actually doing reconnection already. What I'm suspecting out of the little information you shared here is discovery happened a few times already, failed every time leading the client to just give up. A solution could be to increase your maxDiscoverAttempts
and discoveryInterval
to fit your business requirement.
Thanks for the quick response!
So when you say reconnection, are you thinking of these?
Unable to find a node. Retrying... (2/3)
Unable to find a node. Retrying... (3/3)
Maximum discovery attempt count reached: 3
Yes
Increasing the number of attempts and the interval will solve this in most cases, but there is still a possibility that the client might end up "dead", and you won't notice until the next read or write. Feels like this could be handled cleaner by the client, but I guess this solution will do for now.
To be honest, I fail to see how it can be resolved differently. I'm open to suggestions though. The nature of gRPC makes it difficult to know if the connection is down when the channel is not used. An ESDB client can be shared among several threads. If there were a callback of some sort that would notify you that the connection is closed (not down because internally, reconnections do happen), what would you do with such information?
As far as I can tell, when the GrpcClient sets the shutdown flag, it will never be able to recover, meaning that the client instance can never be used again? With a callback, the application using the client could handle this as they please, whether that is to trigger a restart, create a new client, or something else.
Personally, I think using a Let It Crash approach is better in this case. If you use an esdb client in different threads, creating a new client without restarting your application from scratch would be hard. A ConnectionShutdownException
is a fatal error/ finalstate, hence why there is no means to recover from it.
Yep, I agree! I keep repeating myself, but I still think the client should let you know it is dead before you try to use it the next time, which for example may cause an external HTTP request to fail. Guess we'll solve it using a periodic health check.
Hi,
When shutting down one of our cluster nodes for maintenance, the client failed to discover a new node to connect to, which caused it to go into the shutdown state. As far as I can tell, this just happens behind the scenes, and you will not notice until you try to perform another operation with the client, which will then cause a ConnectionShutdownException to be thrown.
Any ideas on how to handle situations like this? Would be nice to be able to reconnect without having to kill the application or create a new client. Another option would be to have a callback that could allow the situation to be resolved immediately, without having to wait for someone to try to perform another operation using the client.