GoogleCloudPlatform / metacontroller

Lightweight Kubernetes controllers as a service
https://metacontroller.app/
Apache License 2.0
792 stars 111 forks source link

Meta controller doesn't include current status in webhook payload #159

Open luisdavim opened 5 years ago

luisdavim commented 5 years ago

Hi,

I was hoping to be able to keep some state within the status part of the resource and retrieving it within the hook calback with something like:

cnameRef := gjson.Get(string(body), "parent.status.cnameRef").String()

but there's no status under parent is this by design?

luisdavim commented 5 years ago

After some more investigation it's even stranger, the first two requests don't have a state under parent but the following ones do, not having state on the first one is expected but on the next on I was relying on this flag to be set in the state and it's not there so I try to generate a new ref, because the object being referenced already exists it fails and I end up with an empty ref so my status has the correct ref for a few seconds and then it's wiped out because I got a request from metacontroller without state after the state was already initialised.

moredure commented 5 years ago

Probably, because of your metacontroller built without concurrency in mind, two or more request can be received at the same time to webhook handler about updates, so, maybe in one case you return one status and in another alternative status, then you need to make sure, that status is set based on what exists and not expected. Think of it as finite state machine. Correct me if I am wrong. Earlier that solves me a lot of problems. Also recommend you to not use any kind of random generated values if possible, otherwise make sure you know what you are doing.

luisdavim commented 5 years ago

But why would I get two sync requests for the same resource at the same time? The problem here is that I only get tha ref when I create a resource in an external system that's why I need to keep it.

moredure commented 5 years ago

Metacontroller as I know, worked in multithreaded way (5 worker goroutines, it's the last number I remembered), so this situation is possible if two+ changes made quickly, handler can processed it almost simultaneously thats why, possibly, race condition took place, status updated some kind of incorrect, but that is on your shoulders, to solve such kind of race conditions. I am not a maintainer, but that is what I came up working with metacontroller.

luisdavim commented 5 years ago

Yeah but I still think this is a bug, in this case I can ensure there where no concurrent changes, it's a Dev environment and the only action was creating one single instance of the CRD, also I'm seeing duplicate requests from metacontroller from the periodic rsync. Without any action on the custom resource. Every 60s, the configured refresh interval I get 2 requests from metacontroller and there's only one resource

luisdavim commented 5 years ago

BTW I did manage to overcome the issue on the hook side but this could still become a scalability issue if metacontroller is doing more requests than it should.

enisoc commented 5 years ago

so this situation is possible if two+ changes made quickly, handler can processed it almost simultaneously

Although there are multiple worker threads, the queue is deduplicated based only on the name of the parent object the needs to be synced (not based on what changed), so Metacontroller will not send concurrent hook requests about the same parent.

For the original issue: In general, when using k8s watches/informers, as Metacontroller does, you cannot be sure that on the next sync you will observe the changes you made on the last one. You might sync again from the local cache (because some child changed, for example) before your local cache receives the update (it is not write-through). Core controllers are designed to always look at the state they see and monotonically make progress towards the desired state, which means the lag doesn't matter.

It can be difficult to be disciplined about that, though, and sometimes it requires creativity to design around that constraint. It's probably worth thinking about whether Metacontroller can help alleviate that.

For this particular case, the pattern core controllers use would look like:

  1. Generate the reference in a way that's deterministic from the contents of the sync hook (parent/child specs).
  2. Try to create the reference in a way that's idempotent (it's a no-op if it already exists).
  3. If the no-op creation returns an error that is specifically "already exists", consider that as success: the thing we want to exist, exists. Report success in status.
kelly-sm commented 4 years ago

I think I may be hitting this same issue:

201

I would add an additional perspective is that this doesn't work well for managing external resources for a couple of reasons: