GoogleCloudPlatform / metacontroller

Lightweight Kubernetes controllers as a service
https://metacontroller.app/
Apache License 2.0
791 stars 104 forks source link

[Feature Request] - User configurable webhook timeout. #54

Closed xocoatzin closed 6 years ago

xocoatzin commented 6 years ago

I recently discovered this project and it looks really great!

Context

I was considering an application for which metacontroler would fit perfectly. A concern I have is that as part of my controller, I would need to make a couple REST calls to an external API to resolve certain conditions at run time. I learned from the source code that the hook calls have a hard-coded deadline of 10 seconds, which is quite limiting.

Is there a reason for this particular value (10s)? I assume it shouldn't be a big issue to rise this limit or make it user-defined if the hook calls are being called asynchronously under the hood (and assuming the hook handler is replicated).

Additional question: How would metacontroler handle error conditions on the hook (e.g. return status !200 or timeouts)? Will the call be skipped, retried, etc?

Request:

Add an option for the webhook configuration similar to:

spec:
  hooks:
    sync:
      webhook:
        url: http://hello-controller.hello/sync
        timeoutSeconds: 60               # Timeout in seconds for the webhook 
enisoc commented 6 years ago

That's a good idea. Thanks for bringing it up!

The hard-coded 10s timeout was put in place back when there was no concurrency in the pre-alpha Metacontroller -- all hooks were in one queue, processed one at a time, so one slow hook would hold up the whole line.

Now, each controller has its own queue with 5 concurrent slots (structured to match kube-controller-manager). So, if one hook call is slow, it only holds up one of the dedicated slots for that controller. If all 5 slots are waiting on slow hooks, then all other syncs for that controller will have to wait until a slot opens up (either by finishing, or by timing out). However, other controllers will not be affected.

Given that this isolation now exists, I like your idea to give controller authors the ability to decide their own timeouts. I've assigned this to myself for now, but if anyone wants to take a stab at it, it should be a fairly straightforward PR. Just comment here so we don't both do it.

Additional question: How would metacontroler handle error conditions on the hook (e.g. return status !200 or timeouts)? Will the call be skipped, retried, etc?

Metacontroller structures the work queue that it hosts for your controller in the same way as built-in controllers like StatefulSet: If something goes wrong while processing an entry in the queue, it will be requeued after a delay; if it fails repeatedly, the delay will increase each time (back-off) until eventually it gives up on that entry.

xocoatzin commented 6 years ago

Well, since it's a pretty small feature I decided to give it a go (pun intended). See PR #55.

As a disclaimer, I'm not experienced in go at all, so hope there's not too many beginner mistakes.

I wasn't able to test it locally, I'm still trying to figure out how to setup the dev environment properly. If you could shed some light about it I'd be happy to do it. It sounds pretty trivial but unfortunately I ran out of time.