GoogleCloudPlatform / metacontroller

Lightweight Kubernetes controllers as a service
https://metacontroller.app/
Apache License 2.0
792 stars 105 forks source link

Metacontroller omitting updated status in subsequent sync #201

Open kelly-sm opened 4 years ago

kelly-sm commented 4 years ago

Had to crank up log verbosity to get to the bottom of this, but what I'm seeing is the following sequence of events:

  1. Custom resource created

  2. Metacontrolller adds finalizer to resource

  3. Metacontroller calls sync hook for the firs time, a status field is returned.

  4. Metacontroller PUTs the status to K8s, returns 200 OK

    1 webhook.go:64] DEBUG: webhook url: http://stable-provisioning-controller.stable-provisioning-controller/kafka-topic/sync response body: {"status":{"state":"InProgress","message":"This topic was successfully submitted to the Kafka cluster, will check back to see if it was created.","failureType":null,"specId":"test","isCreated":false,"currentSpec":{"eventSource":null,"eventType":null,"privacy":"Non PI","schemas":[],"name":"example-topic-minimal-v4","description":"test4","partitions":1,"replicationFactor":2,"topicConfig":{"cleanup.policy":"delete","max.message.bytes":"1048588","retention.ms":"86400000","retention.bytes":"5000000000"},"producerConfig":{"compression.type":"gzip","retries":"2147483647","enable.idempotence":"true","acks":"all","max.in.flight.requests.per.connection":"2"},"specId":"test","terminationProtection":true,"importExternal":false},"topicDescription":{"activeOnCluster":false,"partitions":[],"activeConfig":{}}},"children":[],"finalized":false,"resyncAfterSeconds":5.0}
    I0526 19:59:59.777998       1 round_trippers.go:405] GET https://172.20.0.1:443/apis/streamz.zillowgroup.com/v1/namespaces/zg-dev-streamz-flink-bootstrap-example/kafkatopics/example-topic-minimal-v4 200 OK in 5 milliseconds
    I0526 19:59:59.785136       1 round_trippers.go:405] PUT https://172.20.0.1:443/apis/streamz.zillowgroup.com/v1/namespaces/zg-dev-streamz-flink-bootstrap-example/kafkatopics/example-topic-minimal-v4/status 200 OK in 6 milliseconds
  5. Metacontroller immediately (20 ms later) calls sync hook again, but updated status is missing on parent

This causes a problem because the second sync call has no way of knowing that an external creation request was already submitted so it will try again and fail because of a duplication.

Note, the resourceVersion field isn't changing between the two syncs, nor is the generation. It simply looks like the exact same sync payload is being submitted without any update to the resource from the first sync.

kelly-sm commented 4 years ago

I would add a follow up question to this. Is it ever ok for a sync hook to inspect the status of the parent resource to determine what to do within the sync?