Closed timuthy closed 3 years ago
/rebase
/assign
What do you guys think about enhancing the health check controller as follows:
- Check if
Service
s of typeLoadBalancer
are healthy, i.e., whether their.status
contains an IP or hostname.- If not, read the Events and propagate them?
Sounds like a good idea to me.
Looks good in general. /invite @timebertt for /second-opinion
What do you guys think about enhancing the health check controller as follows:
- Check if
Service
s of typeLoadBalancer
are healthy, i.e., whether their.status
contains an IP or hostname.- If not, read the Events and propagate them?
Yes, sounds reasonable. Should we then also only read events during deletion when the service is of type LB? It probably makes sense to handle these cases the same way, doesn't it?
Hm, yeah, maybe it's fair to start with this approach @timuthy. In the future, we might want switching to simply reading events always if a certain resource "is stuck in deletion" for "too long". However, let's see how it goes and get more experience if this would even help us.
Hm, yeah, maybe it's fair to start with this approach @timuthy. In the future, we might want switching to simply reading events always if a certain resource "is stuck in deletion" for "too long". However, let's see how it goes and get more experience if this would even help us.
When we only wanted to read events of type LB during deletion, we'd need to read the whole object again because we don't have the service spec at hand. I'm not sure if it's worth it or if we by default try to read events for all sorts of services, i.e. it doesn't hurt, imho. Please let me know if that is critical from your point of view. Otherwise, the health checks have been enhanced by the requested feature.
@timuthy do you want to keep the commits or squash? Feel free to merge, however you like.
How to categorize this PR?
/area ops-productivity /kind enhancement /priority normal
What this PR does / why we need it: This PR lets the GRM read
Service
events in case it is stuck in deletion. The output of the two last warnings is added to the ManagedResource condition:The motivation for this change is to reveal more information about the root cause of the problem.
Special notes for your reviewer: I was thinking about reading warnings in general and not only for
Services
but OTHO this would mean more read activities and the relevant use case we know of today is aboutServices
. Please let me know if we should change it or extend the list by further kinds.PR will remain in draft state as long as https://github.com/gardener/gardener-resource-manager/pull/105 is not merged.
Release note: