Closed adamlc closed 7 years ago
The Consul backend does unregister, are you saying that it should do more than that?
Also, a node can fail without ever having a chance to unregister, and in this case it's impossible to know if that node is coming back. Resolving things like this is really an operator's responsibility and arguably somewhat out of scope for this plugin.
The point above is in part why removing a node from a cluster requires a CLI command in RabbitMQ's traditional clustering, and some other systems (e.g. Cassandra) do the same. In RabbitMQ master much of this plugin (but not the Consul part) is in the core, so unregistration can be performed when a node is removed. This plugin in its current form doesn't have any chance of hooking into CLI commands.
Thanks for getting back to me so quick @michaelklishin :)
Fair enough, I suspected that may be the case! What are you thoughts on being able to specify deregister_critical_service_after
in the consul healthcheck? That would suffice in most cases I think :)
@adamlc I'm not a Consul expert but according to the docs that setting is exactly what we want here.
@Gsantomaggio would you agree?
hi @adamlc yes I tested it and when it scales down the node is removed from RabbitMQ Cluster and not from Consul.
It seems that the unregister
is not called directly from the plugin.
@michaelklishin yes I agree and I was thinking about there 3 options:
deregister_critical_service_after
rabbitmqctl eval 'autocluster:unregister().
(this as temp workaround)Options 1 and 2 seem to be no-brainers to me. I am on the fence about option 3.
We can do both 1 and 2, maybe adding a two params to not break the current behavior.
Hey guys!
I've just been evaluating using this plugin for our nomad cluster. I've got everything up and running and all seems good!
I tried reducing the number of tasks in nomad, which worked fine with the
CLEANUP_WARN_ONLY
feature. While the nodes get removed from RabbitMQ the failing healthchecks stay in consul, which triggers our alerting systems. Should these healthchecks be cleaned up automatically when the container stops? I couldn't see anything in the docs. If not then in Consul 0.7+ theres a feature called deregister_critical_service_after that will automatically remove the failed check after a period of time.Would appreciate your thoughts or if I'm doing something wrong! With the dynamic nature of nomad moving containers about this may crop up more!