Kuadrant / dns-operator

Kuadrant DNS Operator
Apache License 2.0
1 stars 9 forks source link

Improve error handling of getting the hosted zone nameservers #60

Closed trepel closed 6 months ago

trepel commented 7 months ago

If private hosted zone is used (it is not supported at the moment but it does not matter) the operator panics when trying to access nameservers. The private hosted zones do not have them. Improve the error handling so that operator does not crash and proper message is shown in logs.

This issue was observed for v0.1.0 tag, the affected code: https://github.com/Kuadrant/dns-operator/blob/515b614d70e569a408c2a3c8e262f18337a7abac/internal/provider/aws/aws.go#L167

Note that the private hosted zone can look like this:

aws route53 get-hosted-zone --id=Z01234567T8N9RVU0NABC
{
    "HostedZone": {
        "Id": "/hostedzone/Z01234567T8N9RVU0NABC",
        "Name": "abc.def.com.",
        "CallerReference": "terraform-20240319062238790500000003",
        "Config": {
            "Comment": "Dev Managed Zone",
            "PrivateZone": true
        },
        "ResourceRecordSetCount": 5
    },
    "VPCs": [
        {
            "VPCRegion": "us-east-1",
            "VPCId": "vpc-0f11d467dd89ac123"
        }
    ]
}

In particular there is no DelegationSet nor NameServers there as it would for public zone, e.g.:

...
    "DelegationSet": {
        "NameServers": [
            "ns-645.awsdns-16.net",
            "ns-1157.awsdns-16.org",
            "ns-1743.awsdns-25.co.uk",
            "ns-496.awsdns-62.com"
        ]
    }
}

The error in policy-controller pod logs looks like:

2024-03-19T09:35:47Z    ERROR   Reconciler error    {"controller": "dnsrecord", "controllerGroup": "kuadrant.io", "controllerKind": "DNSRecord", "DNSRecord": {"name":"api-gateway-api","namespace":"kuadrant-system"}, "namespace": "kuadrant-system", "name": "api-gateway-api", "reconcileID": "2902bc51-4e90-4c78-a066-6183305db74c", "error": "the managed zone is not in a ready state : trepel01-dev-mz"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227
2024-03-19T09:35:47Z    DEBUG   Route53 provider created    {"managed zone:": "trepel01-dev-mz"}
2024-03-19T09:35:47Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference    {"controller": "managedzone", "controllerGroup": "kuadrant.io", "controllerKind": "ManagedZone", "ManagedZone": {"name":"trepel01-dev-mz","namespace":"kuadrant-system"}, "namespace": "kuadrant-system", "name": "trepel01-dev-mz", "reconcileID": "9fbc8c78-931b-476b-a153-3c9c5dd4d0a8"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x17fe3d3]

goroutine 169 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1b71f00?, 0x33173a0?})
    /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/Kuadrant/multicluster-gateway-controller/pkg/dns/aws.(*Route53DNSProvider).EnsureManagedZone(0xc0019b83c0, 0xc0001b9ba0)
    /workspace/pkg/dns/aws/dns.go:131 +0x353
github.com/Kuadrant/multicluster-gateway-controller/pkg/controllers/managedzone.(*ManagedZoneReconciler).publishManagedZone(0x2381670?, {0x237f1a8?, 0xc0008a03c0?}, 0xc0001b9ba0)
    /workspace/pkg/controllers/managedzone/managedzone_controller.go:170 +0x5a
github.com/Kuadrant/multicluster-gateway-controller/pkg/controllers/managedzone.(*ManagedZoneReconciler).Reconcile(0xc000809b00, {0x237f1a8, 0xc0008a03c0}, {{{0xc0004dd350?, 0x5?}, {0xc0004dd370?, 0xc000897d48?}}})
    /workspace/pkg/controllers/managedzone/managedzone_controller.go:107 +0x695
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x2382088?, {0x237f1a8?, 0xc0008a03c0?}, {{{0xc0004dd350?, 0xb?}, {0xc0004dd370?, 0x0?}}})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00080cdc0, {0x237f1e0, 0xc0005800a0}, {0x1c27a80?, 0xc000588460?})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00080cdc0, {0x237f1e0, 0xc0005800a0})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 120
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:223 +0x565

This issue was created upon request in this slack thread: https://kubernetes.slack.com/archives/C05J0D0V525/p1710844895665569

maleck13 commented 6 months ago

possible fix https://github.com/Kuadrant/dns-operator/pull/65