GoogleCloudPlatform / metacontroller

Lightweight Kubernetes controllers as a service
https://metacontroller.app/
Apache License 2.0
792 stars 105 forks source link

Tolerate error responses from the Hooks #126

Closed skurfuerst closed 5 years ago

skurfuerst commented 5 years ago

Hey,

first off, thanks for this great project!

I just ran into an error which crashes Metacontroller, when responsing from a hook with arbitrary JSON:

E1228 07:42:57.804066       1 runtime.go:66] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)
/go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/hashmap_fast.go:696
/go/src/metacontroller.app/third_party/kubernetes/unstructured.go:130
/go/src/metacontroller.app/controller/composite/controller.go:591
/go/src/metacontroller.app/dynamic/clientset/clientset.go:194
/go/src/metacontroller.app/vendor/k8s.io/client-go/util/retry/util.go:64
/go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203
/go/src/metacontroller.app/vendor/k8s.io/client-go/util/retry/util.go:63
/go/src/metacontroller.app/dynamic/clientset/clientset.go:185
/go/src/metacontroller.app/controller/composite/controller.go:583
/go/src/metacontroller.app/controller/composite/controller.go:483
/go/src/metacontroller.app/controller/composite/controller.go:399
/go/src/metacontroller.app/controller/composite/controller.go:221
/go/src/metacontroller.app/controller/composite/controller.go:210
/go/src/metacontroller.app/controller/composite/controller.go:187
/go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/metacontroller.app/controller/composite/controller.go:187
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: assignment to entry in nil map [recovered]
    panic: assignment to entry in nil map

goroutine 223 [running]:
metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0xfd5300, 0x11f7c80)
    /usr/local/go/src/runtime/panic.go:502 +0x229
metacontroller.app/third_party/kubernetes.SetNestedField(0xc42051a420, 0xf81b40, 0xc4204a95c0, 0xc4205a7300, 0x2, 0x2)
    /go/src/metacontroller.app/third_party/kubernetes/unstructured.go:130 +0x1d9
metacontroller.app/controller/composite.(*parentController).updateParentStatus.func1(0xc420142950, 0xc420042ab0)
    /go/src/metacontroller.app/controller/composite/controller.go:591 +0x1fd
metacontroller.app/dynamic/clientset.(*ResourceClient).AtomicStatusUpdate.func1(0xc420146000, 0x411a89)
    /go/src/metacontroller.app/dynamic/clientset/clientset.go:194 +0x1e9
metacontroller.app/vendor/k8s.io/client-go/util/retry.RetryOnConflict.func1(0xc4205a76f8, 0x4122d8, 0x20)
    /go/src/metacontroller.app/vendor/k8s.io/client-go/util/retry/util.go:64 +0x36
metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0xc420517d00, 0x4122d8, 0x40)
    /go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:203 +0x9c
metacontroller.app/vendor/k8s.io/client-go/util/retry.RetryOnConflict(0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0xc420181400, 0x4122d8, 0x20)
    /go/src/metacontroller.app/vendor/k8s.io/client-go/util/retry/util.go:63 +0xba
metacontroller.app/dynamic/clientset.(*ResourceClient).AtomicStatusUpdate(0xc420508e70, 0xc420142128, 0xc420517ce0, 0xc420508e70, 0x3000106, 0x0)
    /go/src/metacontroller.app/dynamic/clientset/clientset.go:185 +0x11f
metacontroller.app/controller/composite.(*parentController).updateParentStatus(0xc4203b3050, 0xc420142128, 0x0, 0xc420142128, 0xc420510270, 0xc420508de0)
    /go/src/metacontroller.app/controller/composite/controller.go:583 +0xa6
metacontroller.app/controller/composite.(*parentController).syncParentObject(0xc4203b3050, 0xc420142128, 0x0, 0xc420042c9f)
    /go/src/metacontroller.app/controller/composite/controller.go:483 +0xa4f
metacontroller.app/controller/composite.(*parentController).sync(0xc4203b3050, 0xc420042c90, 0x22, 0xf84d00, 0xc420472e00)
    /go/src/metacontroller.app/controller/composite/controller.go:399 +0x2a3
metacontroller.app/controller/composite.(*parentController).processNextWorkItem(0xc4203b3050, 0xc4200a6500)
    /go/src/metacontroller.app/controller/composite/controller.go:221 +0xec
metacontroller.app/controller/composite.(*parentController).worker(0xc4203b3050)
    /go/src/metacontroller.app/controller/composite/controller.go:210 +0x2b
metacontroller.app/controller/composite.(*parentController).(metacontroller.app/controller/composite.worker)-fm()
    /go/src/metacontroller.app/controller/composite/controller.go:187 +0x2a
metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc42056a7b0)
    /go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc4205a7fb0, 0x3b9aca00, 0x0, 0x1, 0xc420598000)
    /go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc42056a7b0, 0x3b9aca00, 0xc420598000)
    /go/src/metacontroller.app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
metacontroller.app/controller/composite.(*parentController).Start.func1.1(0xc4204a8030, 0xc4203b3050)
    /go/src/metacontroller.app/controller/composite/controller.go:187 +0x7d
created by metacontroller.app/controller/composite.(*parentController).Start.func1
    /go/src/metacontroller.app/controller/composite/controller.go:185 +0x60c

By looking at the code and the requests/responses, it was easy to see that the hook did not return a "status" JSON structure -- which led to the error above.

Suggestion: Do not break with a fatal error; but instead log an ERROR message and set the Status to "Error while running webhook: Response malformed" (or so).

Related to #91 I guess.

I could also work on this if this is wanted :)

All the best, Sebastian

enisoc commented 5 years ago

In this case, I think omitting "status" from the JSON response should just behave the same as returning an empty status, instead of treating it as an error or invalid response. I've sent PR #132 which should fix that specific issue.

More generally, we intend to better surface hook errors through events (see #7). Currently you can only see them in Metacontroller logs. I prefer putting that in an event because it doesn't reflect the status of the actual object in question, but rather a failure to determine what that status even is.