Open timurnes opened 4 years ago
Sounds like a real edge-case :) To be honest, I never seen the status value being updated. Maybe someone from the founding fathers can explain what it is. It is also usually a singleton. It's possible to override it in cluster settings, yes, but do we expect to host like two clusters? I am not sure how to address this.
Hello. I was playing with Proto Cluster on the dev branch and I've found an issue in case of error coming from Consul on request to register service on re-registration step. In my case it was "no space left on device" and requests fail.
We have this code in ConsulClusterMonitor:
In case of unhandled exception in these methods (in switch) ConsulClusterMonitor actor can fail and will be recovered later. But we will have NullReferenceException on line 79, when ReregisterMember message is received:
{"StatusValue", _statusValueSerializer.Serialize(statusValue)},
_statusValueSerializer can be null here because it is assigned in Register method. And Register method is used only once, when cluster member joins a cluster. So this actor fails again and againUsually it is normally recovered, but sometimes I caught this exception.
Expected Behavior
ConsulClusterMonitor actor is correctly recovered after any issues during requests to Consul after successful registration in cluster
Actual Behavior
ConsulClusterMonitor actor recovered but sometimes throws an exception after ReregisterMember message on line 79
Steps to Reproduce the Problem
I really don't know how to reproduce issues with Consul, also I caught this issue only few times with these steps