Open adamwg opened 4 years ago
Hey @adamwg , I want to do this. Can you guide me a bit? Thanks.
@dlemel8 Absolutely!
You can start by taking a look at the admission-controller-webhook-timeout
check that we currently have in the doks
group. I think it already implements the basic logic we would want for a timeout check in the basic
group, so a first step would be to move it to the basic/
directory, and add basic
to the list of groups it's registered in. (It should also stay in the doks
group, since it is a DOKS pre-upgrade check.) The wording in the check diagnostics will need an update to be more generic.
I think there are at least two other checks that could be implemented in the basic
group. These are both currently covered by the admission-controller-webhook-replacement
check in the doks
group, but should be broken out into their own checks:
kube-system
namespace.If those conditions are broken out into their own checks, they could be added to the doks
group as well, and we could make the larger existing check smaller.
@adamwg great, thanks! I will start working on it soon :)
@adamwg just to make sure I understand correctly (regarding to admission-controller-webhook-replacement
refactoring):
@dlemel8 Sorry for the delay getting back to you on this.
I think for the new check(s) in the basic group. we want to omit some of the existing conditions. The checks I'm proposing are based on the upstream guidance around webhook configurations, so they should be quite strict and would generate warnings for some configurations that are currently considered OK.
@adamwg OK, that's answer my question about the new checks on basic group.
What about the question about the current admission-controller-webhook-replacement
?
if we remove some of its conditions we will warn about a service that now is OK
We currently have a variety of webhook checks in the
doks
group, since various webhook configurations can be problematic for DOKS upgrades. However, there are also some generic best practices around admission control webhooks, documented in the upstream docs. For example, it's a generic best practice to set timeouts to small values (definitely less than 30 seconds, since that's the default apiserver request timeout).We should build some generic webhook best practice checks that can be included in the
basic
group as well as thedoks
group.