GoogleCloudPlatform / metacontroller

Lightweight Kubernetes controllers as a service
https://metacontroller.app/
Apache License 2.0
792 stars 105 forks source link

CompositeController goes into hot loop #171

Open jandenouden2 opened 5 years ago

jandenouden2 commented 5 years ago

Hi, I am new to metacontroller and github in general, so please let me know if I forgot something. Thanks for writing/contributing to metacontroller, it appears to be the ideal solution for my first operator.

I am running the HelloWord example (https://metacontroller.app/guide/create/) on a microk8s cluster. When I create a HelloWorld resource the controller picks it up, the child pod is created, but it appears that metacontroller continually calls the sync webhook, even nothing has changed, quickly generating many generations: metacontroller.log

I0527 14:01:39.489674       1 controller.go:406] sync HelloWorld hello/your-name
I0527 14:01:39.506012       1 controller.go:406] sync HelloWorld hello/your-name
I0527 14:01:39.513006       1 controller.go:406] sync HelloWorld hello/your-name
I0527 14:01:39.623230       1 request.go:485] Throttling request took 106.161551ms, request: PUT:https://10.152.183.1:443/apis/example.com/v1/namespaces/hello/helloworlds/your-name
I0527 14:01:39.630174       1 controller.go:406] sync HelloWorld hello/your-name
I0527 14:01:39.823080       1 request.go:485] Throttling request took 183.409633ms, request: GET:https://10.152.183.1:443/apis/example.com/v1/namespaces/hello/helloworlds/your-name
I0527 14:01:40.023058       1 request.go:485] Throttling request took 196.123082ms, request: PUT:https://10.152.183.1:443/apis/example.com/v1/namespaces/hello/helloworlds/your-name
I0527 14:01:40.026549       1 controller.go:406] sync HelloWorld hello/your-name
I0527 14:01:40.223061       1 request.go:485] Throttling request took 192.155825ms, request: GET:https://10.152.183.1:443/apis/example.com/v1/namespaces/hello/helloworlds/your-name
I0527 14:01:40.423835       1 request.go:485] Throttling request took 198.906384ms, request: PUT:https://10.152.183.1:443/apis/example.com/v1/namespaces/hello/helloworlds/your-name
jandenouden2 commented 5 years ago

I've enabled auditing with

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
    - group: example.com
      resources: ["helloworlds"]

which produces audit.log

jandenouden2 commented 5 years ago

I notice that metacontroller sets status.observedGeneration in addition to what the webhook returns. Naively speaking it looks like an update triggers a sync, which updates status.observedGeneration, which triggers a sync, etc. ?

jordan-da commented 5 years ago

I am having the same issue. It seems all my children are ok and don't regenerate, but the parent runs on an infinite loop, continuously going up forever. Your thoughts are mine as well, there is a feedback loop with some field that is triggering a gen but shouldn't be, like observedGeneration. Still poking around.

jordan-da commented 5 years ago
5,9c5
<     "annotations": {
<       "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"foo.com/v1\",\"kind\":\"MyCrd\",\"metadata\":{\"annotations\":{},\"name\":\"thing\",\"namespace\":\"project-bar\"},\"spec\":{\"param\":\"stuff\"}}\n"
<     },
<     "creationTimestamp": "2019-06-11T13:18:04Z",
<     "generation": 2617,
---
>     "annotations": {},
11,14c7
<     "namespace": "project-bar",
<     "resourceVersion": "3084449",
<     "selfLink": "/apis/foo.com/v1/namespaces/project-bar/mycrd/thing",
<     "uid": "5835c903-8c4b-11e9-ac2e-42010a8e00f7"
---
>     "namespace": "project-bar"
18,21d10
<   },
<   "status": {
<     "foo": "bar",
<     "observedGeneration": 2616

Running a sorted json diff on last-applied-configuration and current config yeilds nothing that should be running up the gens.

Turning up the vs on metacontroller to 5 also don't show any diff

jordan-da commented 5 years ago

seems like a regression, downgrading to metacontroller/metacontroller:v0.3.1 from metacontroller/metacontroller:v0.4.0 seems to make it stop

enisoc commented 5 years ago

The last time this came up, I think the problem was that newer versions of Kubernetes started setting metadata.generation on CRD objects, even if the CRD's status subresource is disabled. That broke a brittle assumption in Metacontroller that metadata.generation being set meant that it ought to pass that through to status.observedGeneration.

The workaround we found is to make sure the status subresource is enabled for your CRD. This is also recommended in general for any CRD that you use with Metacontroller.

jordan-da commented 5 years ago

just to confirm, adding

  subresources:
    status: {}

to my CRD fixed the issue

more information here:

https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#status-subresource

alexellis commented 5 years ago

Adding the above from @jordan-da fixed this for me too.. I was very confused by the hot-loop using the Go example.