defenseunicorns / uds-k3d

Repository for the UDS k3d dev/test environment
Apache License 2.0
6 stars 1 forks source link

Persist DNS resolution across machine and Docker restarts #99

Closed justinthelaw closed 1 month ago

justinthelaw commented 2 months ago

Is your feature request related to a problem? Please describe.

CoreDNS Corefile import uds.overrides contains a rewrite for .uds.dev domains to host.k3d.internal. This rewrite goes stale when the Docker containers hosting K3d are stopped without using k3d cluster stop uds (e.g., machine sleep, Docker daemon restarts, machine restart, etc.).

We encountered this issue when ai.uds.dev tried resolving supabase-kong.uds.dev after we shutdown the K3d cluster in a non-K3d way. This is a known limitation of K3d and Docker.

I understand UDS K3d is only for short-lived development clusters; however, Growth engineers usually like to have a pre-stood up instance local to their laptop for roadshows and events. Developers may also appreciate this when they want to restart, shutdown or hibernate their machines but want to keep the cluster state intact without worrying about using k3d cluster stop uds every time.

Describe the solution you'd like

See the 2 possible alternatives to solving this issue.

Describe alternatives you've considered

  1. A possible fix seems to be editing the CoreDNS's Corefile uds.overrides import that uses host.k3d.internal as the sole rewrite for all *.uds.dev requests. Below is the old versus the new rewrite pattern (example of an automated version to fit the end-user's use case)

    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: coredns-custom
    namespace: kube-system
    data:
    uds.override: |
      rewrite stop {
        name regex (.*\.uds\.dev) host.k3d.internal answer auto
      }
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: coredns-custom
    namespace: kube-system
    data:
    uds.override: |
      rewrite {
        name regex (.*\.uds\.dev) tenant-ingressgateway.istio-tenant-gateway.svc.cluster.local answer auto
      }
      rewrite {
        name regex (.*admin\.uds\.dev) admin-ingressgateway.istio-admin-gateway.svc.cluster.local answer auto
      }
  2. Add to the documentation that the cluster should be stopped using k3d cluster stop uds before a machine is restarted or the computer goes to sleep/hibernates.

Additional context

N/A

justinthelaw commented 2 months ago

PR #100 goes with alternative number 2 to provide a minimally invasive, K3d-native fix. However, the automation recommendation would also potentially be a more technical/adaptable solution for this current issue and issue #98

justinthelaw commented 1 month ago

After further consideration, implementation of Option 1 simplifies the developer experience and allows edge-case experimentation (e.g., loopback DNS resolution, alternative service meshes, etc.). This would also set things up for alternative domains beyond *.uds.dev if desired by the team in the future, although there would be many more moving parts as seen in this example.

I will create a PR referencing this re-opened issue once ready.