Closed Tratcher closed 1 month ago
I just bumped into this myself and agree with @Tratcher’s recommendation.
Recommendation: ResourceUtilizationHealthCheck should be split into two IHealthCheck implementations, one for CPU and one for memory. That way both states can be reported independently.
Anything I can do to help?
Thank you for bumping this issue, @willibrandon. To me it totally makes sense.
@geeknoid @rsreepathi @mwierzchowski @mobratil what do you think of this?
To start the ball rolling, we'll need an API proposal which we'll then discuss and take to the ASP.NET API Review Board for further approval.
/cc: @evgenyfedorov2
Hi @willibrandon , thanks for stepping in.
It does not look like we have to make any changes to public API surface, we probably can just create those two proposed IHealthCheck
implementations and register both via the existing extension methods.
What do you think?
@evgenyfedorov2 - I apologize I am usually more prompt than this.
Yes I agree that we can just create those two proposed IHealthCheck
implementations, but I also think that the health of each component should be included in the HealthCheckResult
using the optional IReadOnlyDictionary<string, object> data
parameter.
Or perhaps it would be more appropriate to return a HealthReport?
Represents the result of executing a group of IHealthCheck instances.
Or perhaps it would be more appropriate to return a HealthReport?
Represents the result of executing a group of IHealthCheck instances.
From my observations, dotnet/aspnetcore has two implementations:
Task<HealthCheckResult> CheckHealthAsync(...)
Task<HealthReport> CheckHealthAsync(...)
The first one is used whenever there is a single component to be checked for health (this is the part of the IHealthCheck
contract). The last one is used whenever there are multiple components involved (this is part of HealthCheckService
contract).
In our case, we use the first signature, however, we check multiple components (i.e., CPU and memory), and this doesn't feel correct. I believe, we should have defined multiple health checks - one for the CPU utilisation, one for memory consumption, etc. Then those health checks should get registered via the standard mechanisms and aggregated via HealthCheckService
implementations.
IMO the component should keep doing what it is meant for (resource utilization health checks) and the simplest fix would be to just send a compound message per health level:
Creating two separate healthchecks is a different issue, though could make sense since healthchecks reports is a thing that would solve the aggregation issue (a breaking change if this one is removed instead). Also someone can already track CPU only or memory only with this component (by setting custom limits for each).
IMO the component should keep doing what it is meant for (resource utilization health checks) and the simplest fix would be to just send a compound message per health level:
- if CPU and memory are unhealthy send a message like "CPU and Memory usages are above their limits"
- if only CPU or memory is unhealthy keep the current message (independently of the other being in degraded status)
- if CPU and memory are degraded send a message like "CPU and Memory usages are close to their limits"
- if only CPU or memory is degraded keep the current message
Creating two separate healthchecks is a different issue, though could make sense since healthchecks reports is a thing that would solve the aggregation issue (a breaking change if this one is removed instead). Also someone can already track CPU only or memory only with this component (by setting custom limits for each).
@sebastienros, to clarify I understand your suggestion correctly, do you mean the following?
message = "";
if (cpu != healthy) { message += "CPU unhealthy" }
if (memory != healthy) { message += "memory unhealthy" }
return HealthCheckResult.Unhealthy(message);
@RussKie something like this, yes.
An alternative could be to have the degraded message also added, but still return the highest of the levels. Example with CPU unhealthy and memory degraded:
CPU usage is above the limit. Memory is close to the limit
, and return Unhealthy
because one them at least is unhealthy. If both were degraded then Degraded
would be returned.
That would also make the code easier. Just keep the current checks, but don't return, just append to the message. And in the end return the highest detected level.
As a short-term solution/workaround this is probably ok, but to me this still feels as conflating two (or more) checks into one, which feels like conflicting with the purpose of the IHealthCheck contract.
@RussKie definitely.
I assume that the goal here was performance. Two healthchecks would mean two calls to _dataTracker.GetUtilization(_options.SamplingWindow);
which might be expensive.
A compound message in the description
works for me and even better if additional key-value pairs describing the health of each component could be included in the data
.
Description
When checking health based on resource utilization, the conditions are checked in series and only the first failure is reported. https://github.com/dotnet/extensions/blob/68810ca7399be38b8490a114ef52fdae4ce00058/src/Libraries/Microsoft.Extensions.Diagnostics.HealthChecks.ResourceUtilization/ResourceUtilizationHealthCheck.cs#L39-L60
Reproduction Steps
Test on a system with both high CPU and memory.
Expected behavior
Both CPU and memory issues should be reported.
Actual behavior
Only the first issue found is reported.
Regression?
No response
Known Workarounds
No response
Configuration
8.0
Other information
Recommendation: ResourceUtilizationHealthCheck should be split into two IHealthCheck implementations, one for CPU and one for memory. That way both states can be reported independently.