RabbitMQ cluster health check?

Timovzl commented 3 years ago

I could not find this, so I'm not entirely sure if this is a question or an enhancement request. :)

Is it possible to add a health check for a RabbitMQ cluster? We simply have a set of endpoints that we connect RabbitMQ to, but as far as I can see, the health check only supports a connection string of a single endpoint.

We'd still like to test if the cluster is available.

Namoshek commented 3 years ago

I don't think such a method exists yet, even though it would make a lot of sense, especially with StackExchange.Redis. But why not just add multple checks, one for each host?

var healthChecksBuilder = services.AddHealthChecks();

var redisHosts = Configuration.GetValue<StackExchange.Redis.Extensions.Core.Configuration.RedisHost[]>("Redis:Hosts");
for (int i = 0; i < redisHosts.Length; i++)
{
    healthChecksBuilder.AddRedis(name: $"redis_{i}", redisConnectionString: $"{redisHosts[i].Host}:{redisHosts[i].Port}");
}

Edit: please don't ask why I wrote about Redis even though the question is about RabbitMQ... I don't know myself. But obviously you can adapt the example.

Timovzl commented 3 years ago

But why not just add multple checks, one for each host?

Unfortunately, demanding that every endpoint is healthy is a much stricter requirement than demanding that the cluster as a whole is healthy. The whole point of a cluster is that it remains operational if a node goes down (or actually as long as X or fewer nodes are down). Failing the health check when a single node is down would defeat the purpose.

Since the application would not have knowledge of what constitutes a quorum to any particular cluster, I imagine that an implementation would need to rely on something that the cluster itself offers.

Namoshek commented 3 years ago

Unfortunately, demanding that every endpoint is healthy is a much stricter requirement than demanding that the cluster as a whole is healthy.

You are right that it is a significantly stricter requirement, and probably not wanted in most scenarios.

The whole point of a cluster is that it remains operational if a node goes down (or actually as long as X or fewer nodes are down). Failing the health check when a single node is down would defeat the purpose.

The point of a cluster could also be scalability, if one instance is unable to handle all frontend services. But for a really scalable system, there should probably be a load balancer in front of the RabbitMQ / Redis instances. In case of Redis, that would leave us with two endpoints, one for reads and one for writes. And both should be operational for a healthy system.

Since the application would not have knowledge of what constitutes a quorum to any particular cluster, I imagine that an implementation would need to rely on something that the cluster itself offers.

I don't think that is the case. In my opinion, when a cluster is unable to operate due to a missing quorum, it shouldn't answer requests which you would make as part of a health check. It should probably stop listening on the ports entirely, to be precise.

Timovzl commented 3 years ago

Since the application would not have knowledge of what constitutes a quorum to any particular cluster, I imagine that an implementation would need to rely on something that the cluster itself offers.

I don't think that is the case. In my opinion, when a cluster is unable to operate due to a missing quorum, it shouldn't answer requests which you would make as part of a health check. It should probably stop listening on the ports entirely, to be precise.

We are in agreement. This is the type of thing I was referring to by "something that the cluster itself offers". In other words, the cluster takes care of its quorum, and the application can, in some way, simply observe the conclusion.

Xabaril / AspNetCore.Diagnostics.HealthChecks

RabbitMQ cluster health check? #717