Open phs opened 5 years ago
I think this works as designed and/or as originally intended: the decorator and composite controllers expect all the children to be in the same namespace as the parent if the parent is namespaced.
You can see that happening explicitly around these lines:
This design makes sense to me, but I also have a use case for changing it: the GKE config connector expects you to put resources in a namespace that corresponds to a GCP project, and quite rightly so given that projects are how GCP resources get namespaced.
It looks like the owner reference points at the actual custom resource, which implies to me that the distinction between listing all namespaces vs just one isn't strictly necessary for correctness because the controller can't get confused about who the parent is and who its children are. I'm going to try just always listing all namespaces.
Found this comment, which implies to me that the underlying machinery of informers and so on aren't guaranteed to work as intended and trigger finalizers and so on: https://github.com/GoogleCloudPlatform/metacontroller/blob/c58e22e03c7503c63892324fa2c3f6bf675ae7cd/controller/decorator/controller.go#L303-L310
So, while always listing all namespaces solves part of this issue (just deployed, no errors, children in request as desired), it leaves things in an awkward state with respect to meeting every expectation on lifecycle. I don't personally have any need for finalizers for what I'm doing, but what I'm doing is definitely a hack.
It seems like the way to make this work without the hack would be to just make your resource cluster-wide and make the namespace part of its spec, or put it in a label or annotation, or pack it into the name.
To be clear, the CRD in question isn't namespaced; from the repro script:
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: reproresources.repro.example.com
spec:
group: repro.example.com
names:
kind: ReproResource
plural: reproresources
singular: reproresource
scope: Cluster
version: v1alpha1
Or you do mean something else when you refer to the parent?
Apologies, I did not actually read your script and log, I jumped to a conclusion about your issue after troubleshooting my issue.
Now that I've actually read the script and used it, I can give you potentially useful information. I was not able to reproduce the issue with your script using v0.4.0, with or without the change I mentioned above. Surprisingly, I wasn't even able to reproduce the error with v0.3.1.
Here's your script with slight modification: repro.sh.gz
Here's my run with v0.3.1: repro_v.0.3.1_stdout.gz repro_v0.3.1_metacontroller.log.gz
I think I've found another cluster-scoped vs. namespace bug.
I have a decorator controller watching a cluster-scoped CRD. It decorates instances by creating a namespace, and then placing a resource quota in that new namespace. The web hook only attaches the resource quota if it sees the enclosing namespace in its observed attachments: it tries to roll things out in two steps.
Based on the observed attachments arriving at the web hook, this appears to work. First there are no attachments, then the namespace only, and for exactly one iteration the resource quota is also observed (and so it must have been created.)
On subsequent iterations the resource quota is not observed, and when metacontroller tries to recreate it, it fails with
The resource quota does indeed exist, it seems only metacontroller has forgotten about it:
Obviously it's possible that I am configuring my decorator controller or implementing my web hook incorrectly, so I went ahead and made a repro script, which I've attached along with a log of me running it. For some reason github refuses to let me attach a script, so it is gzipped.
repro.log repro.sh.gz