Kong / charts

Helm chart for Kong
Apache License 2.0
248 stars 479 forks source link

Separate proxy and controller Deployments #921

Open rainest opened 11 months ago

rainest commented 11 months ago

https://github.com/Kong/kubernetes-ingress-controller/issues/4671 talks about chart stuff but isn't in this repo (presumably for milestone reasons). This is a stub to provide charts tracking of the same.

rainest commented 11 months ago

tl;dr

Using an umbrella chart results in a number of limitations on template development (in general, we cannot share information between the two releases, and thus cannot, for example, automatically configure the controller admin API target based on the proxy's admin API Service configuration) and challenges to release processes. If we no longer wish to support deploying the controller as a sidecar, it makes sense to use a single chart that deploys the controller and Kong each in their own Deployment.

This requires some redesign of the values.yaml to support configuring the Deployments independently. There are several options for this:

https://github.com/Kong/charts/pull/923 is the companion to this analysis. It annotates keys of interest in place within values.yaml.

I do not think we should attempt to support both models, as designing good configuration and templates for that would be quite difficult. If we want to keep support for the sidecar in any fashion, I'd recommend doing that in a continued 2.x line while we use 3.x for split Deployments, and work with users to figure out what they'd need to migrate if it's difficult.

Background

The current design of the kong chart uses a single Deployment, which can contain a Kong container, controller container, or both. values.yaml only has a single set of Deployment-level (and Pod-level) settings, as these are shared equally among all containers.

Migrations Jobs often also share many of these settings even though they're a separate workload. Since migrations use a Kong container, settings applicable to the Kong Deployment are often applicable to the migrations, and sharing them allows us to economize on settings.

To split Kong and the controller into separate Deployments, we thus need to convert these settings into two settings in most cases. We may opt to fall back to the original shared setting for both--it's not ideal and limits some functionality (e.g. you probably don't want the same replica count and scaling rules for a controller Deployment as a Kong Deployment), but could allow for a simpler transition for existing users. Defaults for the new settings in the base values.yaml will apply, however, so this option is probably less appealing than it'd seem initially.

values.yaml organization

The chart has grown organically over time, with historically less-informed decicisions about how to organize its configuration effectively. We have, for example, root-level .podAnnotations, .deploymentAnnotations, and .deployment configuration keys. Although it'd make sense to nest these along the lines of the actual resource structure, like

deployment:
    annotations: {}
    pod: 
        annotations: {}

we added one of the root-level annotations keys early, before we needed the others, and didn't move it when it became disorganized.

Mirrored and consolidated Kubernetes resource configuration

Since adding a second Deployment will require separate configuration keys for many existing settings, it'd make sense to organize them following our most recent structure. For example:

deployment:
  controller:
    annotations: {} # Deployment-level annotations
    autoscaling: {} # A duplicate of the current root key, for the controller Deployment
    pod:
      annotations: {} # Pod-level annotations. currently there is a single root-level .podAnnotations used for the single unified Pod
      serviceAccount: {}
      container:
        env: {...} # controller environment
        image: {}
      sidecarContainers: [] # attach a tcpdump or what have you
  kong: {} # same as the controller section

admissionWebhook: {} # the current contents for .ingressController.admissionWebhook

This section should hold all configuration that lives under the Kubernetes Deployment resource.

This section may hold configuration related to Deployment-associated resources, like HorizontalPodAutoscaler (.deployment.*.autoscaling). We can alternate similarly split those sections (autoscaling.controller.* and autoscaling.kong.*).

Application-level configuration

We have several sections dedicated to application-specific configuration unrelated to Kubernetes resources, such as the ingressController.konnect settings and enterprise settings.

While these generally manipulate values in a container's environment, they don't clearly fit into the Kubernetes resource structure from a user POV. As such I think we should keep them mostly at their current locations, with the end result that we have deployment.controller (or deployment.ingressController if we want name consistency) for standard Kubernetes resource settings and ingressController.konnect, ingressController.gatewayDiscovery, and so on for our application-specific configuration.

Migration strategies

Moving keys is a breaking change for existing settings. If we move configuration currently elsewhere (e.g. .podAnnotations to .deployment.kong.pod.annotations), users will have to update configuration.

We can provide a helper script, however, as looking up a YAML key and moving it elsewhere isn't particularly hard.

We could alternatively continue to support existing keys, but this comes with template complexity and non-obvious precedence rules. If you, for example, see documentation saying "set .deployment.kong.pod.annotations" and do so without realizing you already have a .podAnnotations and we prefer the latter, your settings will not take effect and you will be confused. We would probably need to prefer the older setting if present due to the way values.yaml defaults work.

We can opt to use the newest organization pattern for new settings only (for example, creating .deployment.controller.pod.annotations while retaining .podAnnotations for the proxy, but this prolongs the problem.

Existing release migration

After changes to values.yaml, users will need to upgrade releases.

kong chart users

Going from the kong chart with a single Deployment to one with multiple should not require much beyond updating values.yaml, and maybe choosing different settings for the new separate Deployment (optional, but probably advisable).

AFAIK the main task here is to handle the change from an internal localhost listen for the admin API to discovery, so flipping the admin Service on with the appropriate external listen and enabling discovery. There's some manual effort required, but not much beyond a typical upgrade.

Multiple release kong chart users

Although some classes of separate release (namely separate controller releases) should no longer be necessary, users can continue using them to avoid breaking release continuity.

These users still need to migrate values.yaml and set up discovery if they were using a single-Deployment Kong+controller release, but they do not need to consolidate releases if they do not want to. The procedure to create, for example, a hybrid DP release is still to create a release with the controller disabled and DP settings configured.

Although the multi-Deployment configuration would make single-release hybrid mode more viable, I'm not considering it at present.

ingress chart users

This state has no perfect migration path. Helm, broadly, cannot transfer resources from one release to another (there's technically maybe a path by uninstalling and keeping resources and then manipulating labels to trick it into adopting them into a new release, but that's fiddly as heck and I don't really want to entertain it as a generally acceptable plan of action).

Migrating existing ingress chart releases to the kong chart would thus require deleting an ingress release and creating a kong release. This requires an outage window in the simplest case, though it may be possible to carefully orchestrate a handover with the old release still installed.

Since ingress is DB-less only, this approach should be fine: the new instances will come up, see the same Ingress resources, and apply the same configuration. This shouldn't be significantly different from running kubectl rollout restart beyond that there are temporarily no replicas of the proxy or controller.

Template refactoring

Using mirrored structure for the Kubernetes resource configuration should technically allow us to use a shared template for them that we invoke for each section. However, application-level configuration, though not directly part of the resource configuration, ends up there, mostly in the form of environment variables.

It may be simple enough to just inject the environment variable set generated from another section into the Deployment template. I'd have to try it to be sure.

pmalek commented 11 months ago

I do not think we should attempt to support both models, as designing good configuration and templates for that would be quite difficult. If we want to keep support for the sidecar in any fashion, I'd recommend doing that in a continued 2.x line while we use 3.x for split Deployments

👍 I like the idea. Make a clear end of support date for 2.x for users to align the expectations and huge links to the 3.x and the migration guide.

We can provide a helper script, however, as looking up a YAML key and moving it elsewhere isn't particularly hard.

That's the way I've done it in the past. Spoiler alert: it's always more work than anticipated.

This state has no perfect migration path. Helm, broadly, cannot transfer resources from one release to another (there's technically maybe a path by uninstalling and keeping resources and then manipulating labels to trick it into adopting them into a new release, but that's fiddly as heck and I don't really want to entertain it as a generally acceptable plan of action).

Tempting... but no. This will end badly. Fiddly is an understatement.


What I'm slightly concerned about is the testing situation. We had 1 approach to golden tests in https://github.com/Kong/charts/pull/748 but that never took off. We could revive that idea and use it as we progress with the refactor and reorganisation of keys in values to ensure we're getting the templates output that we're expecting.

rainest commented 11 months ago

Spoiler alert: it's always more work than anticipated.

Any particular pitfalls you hit, or just that some items that needed migrations were missed initially?

pmalek commented 11 months ago

Spoiler alert: it's always more work than anticipated.

Any particular pitfalls you hit, or just that some items that needed migrations were missed initially?

If it's 1:1 move without transformation then it should be OK-ish, with any sort of transformations or pivoting it's annoying to get it right the first time

rainest commented 11 months ago

Since we are (along with simply rearranging settings) moving users from sidecar to gateway discovery, the admin API must be exposed, which is not something existing kong chart users have had to consider for their security model.

We did unfortunately omit restrictions on its usage in ingress, but I don't think we'd want to release a final (i.e. not alpha) chart 3.x without mTLS automation in place, since we'd otherwise bake a reduction in security into an upgrade.

pmalek commented 11 months ago

If we take lessons learnt from https://docs.google.com/document/d/14jwnYoSj5RGvOCaWGpPuqa1VvyjKnFmscnYpiIxfXQ0/edit#heading=h.csub82xg1smh and (haven't figured it out yet) generate certs via helm's genSignedCert (or similar) just like we're doing it now but with proper alternative names, this should work right?

Currently we're using https://github.com/Kong/charts/blob/0242f363bbafc576a1f370b9b4220abe0c6a269c/charts/kong/templates/service-kong-admin.yaml#L35

which wouldn't work with Gateway Discovery because

We could then flip the tls skip verify to false.

rainest commented 11 months ago

Practical update testing with a (slightly outdated) https://github.com/Kong/charts/pull/947 indicates that the ingress->kong migration actually lacks one of the major expected hurdles: Helm just doesn't care if you change the chart used for a release. Running something like

helm upgrade --install ana kong/ingress
helm upgrade --install ana rainestcharts/kong

is entirely acceptable and functions no differently than upgrading to a new version of the same chart. Migrating an ingress install with defaults to kong 3.x with defaults as the same release works fine.

Changes to resource names do, however, result in issues. The default names for Deployments and Services in kong lack the -gateway bit from the ingress subchart. Services will effectively be created new (and may be assigned new IPs depending on their configuration) and the old Deployments instantly vanish and promptly terminate their replicas, often before the new Deployment's replicas come online. Setting nameOverride allows mimicking the existing ingress names and replacing replicas/keeping the same Services as one would normally expect. The Service type override won't be necessary in the final release, since the admin default has changed to ClusterIP.

kong somewhat amusingly upgrades less smoothly in some respects. Although there are no naming issues, the new controller attempts to talk to the existing Pods, which do not have the admin API exposed, and never comes online. I'm not entirely sure why discovery doesn't find the new replica, but it doesn't. Exposing the admin API first appears necessary.

This doesn't yet cover the introduction of mutual TLS between the controller and admin API, which I'd like to have in place by default, and which will probably introduce additional complications.

Values migration from ingress is a separate, more difficult bag of worms. Keys no longer have their ingress and gateway prefixes, and while we can simply flatten them, some keys (e.g. replicaCount or podAnnotations) may be present in either section. We can probably migrate these by stuffing the ingress ones into the equivalent ingressController settings for the new Deployment in kong, but it requires more exhaustive review than simply migrating the keys that have changed.

rainest commented 11 months ago

values.yaml migration strategy as of https://github.com/Kong/chart-migrate/tree/71146f9534e74a8195d205efa5e13d152db99b60

kong chart values.yaml are easy. You move the keys that moved and leave the rest in place. The root command does this in a single step.

ingress chart values.yaml are more complicated. There are

  1. controller.ingressController values that have moved need to move to their new home under ingressController.
  2. controller keys that apply to the Deployment/Pod/etc. that should now configure the second Deployment in kong need to move to their equivalent keys under ingressController.deployment.
  3. gateway keys generally need to move to the root.
  4. controller keys that do not apply to the Deployment and such also need to move to the root.
  5. Keys that are present in both gateway and controller that were not moved in either 1 or 2 are kind of a mystery. IDK how you should interpret both subcharts configuring stuff under certificates, for example--it's something you technically can, but probably shouldn't do. These prefer whatever value is in gateway and print a warning, since they probably require some user review. Hopefully this does not apply to most configurations.

We can run the root migrate command with a separate key list to handle 1 and 2. From there, a second merge command flattens the values.yaml to handle steps 3-5. This step deletes several keys where the merge strategy (preferring gateway keys) may cause issues:

// should not exist. could maybe create the webhook or RBAC resources, but they'd be configured incorrectly
// we _do not_ want this key to use the value from gateway
gateway.ingressController
// unsure how we should handle this along with deployment.kong.enabled. ignoring it since we can reasonably assume
// that ingress installs are not doing split deployments for the most part. on-prem hybrid is probably the only
// major exception.
gateway.enabled
// ditto
controller.enabled
// contained some overrides that were a hack for subchart stuff, should not be needed
controller.proxy
rainest commented 10 months ago

Following up on mTLS now that the split is staged. Some discussion earlier at https://github.com/Kong/charts/issues/921#issuecomment-1810668048

The existing chart-managed mTLS implementation from https://github.com/Kong/charts/pull/780 is a bit odd:

Configuration-wise, the controller and proxy configuration is not integrated. You can set the controller to use a generated certificate and CA, but the proxy will not require it unless you manually specify its name in the admin API configuration.

Implementation-wise, the admin Service template handles controller certificate generation and part of the proxy mount, though again, these systems do not integrate with one another. The proxy mount (from the admin API configuration) uses an unusual way to handle the configuration options. admin.tls.client.caBundle lets you dump a CA into values.yaml. There's no existing resource for this, so the chart creates a ConfigMap to hold it. This ConfigMap is also used if you provide a Secret name with secretName: the Helm client uses lookup to retrieve the Secret and extracts its contents, which it then copies to the ConfigMap, rather than mounting the Secret directly.

This implementation approach hits a chicken-egg problem with the generated certificate, since the lookup attempt precedes creating the Secret. To work around this without larger refactoring, I was able to use a separate if clause that mounts the Secret if we want to use the generated certificate.

I kinda want to simplify the UX and implementation here. Ideally we'd only use a single section, since there shouldn't be any reason to configure different CAs for the admin API and controller. I don't know that I'd want to make a breaking change for this, however, and it's not something where we could simply migrate key locations. On the implementation side, dropping support for providing the CA string in values.yaml would allow us to use a normal Secret mount--I figure extraResources is available if you really want to manage the CA from values.yaml.

czeslavo commented 10 months ago

I'll try to bring some context on why some things were done the way they are.

Configuration-wise, the controller and proxy configuration is not integrated. You can set the controller to use a generated certificate and CA, but the proxy will not require it unless you manually specify its name in the admin API configuration.

Originally https://github.com/Kong/charts/pull/780 was implemented with split releases in mind as that was the only way to run Gateway Discovery at that time. Because of that, we couldn't make any validation between proxy and controller configuration.

This implementation approach hits a chicken-egg problem with the generated certificate, since the lookup attempt precedes creating the Secret. To work around this without larger refactoring, I was able to use a separate if clause that mounts the Secret if we want to use the generated certificate.

The idea was that one of the releases must be responsible for generating the key-pair Secret and the controller was designated to be that one. Because of that, you had to first deploy the controller release and then use the generated secret name in the gateway release (or the name you hardcoded in ingressController.adminApi.tls.client.caSecretName) - it's described in the README (bottom of the Gateway discovery section):

On the controller release side, that can be achieved by setting ingressController.adminApi.tls.client.enabled to true. By default, Helm will generate a certificate Secret named -admin-api-keypair and a CA Secret named -admin-api-ca-keypair for you. ... On the Gateway release side, set either admin.tls.client.secretName to the name of your CA Secret or set admin.tls.client.caBundle to the CA certificate string.