eno-reconciler processes that use selectors to restrict compositions they are responsible for reconciling will happily delete all resource slices not matched by the selector - not a great failure mode lol.
The fix for --composition-namespace is easy - only watch resource slices in the composition's namespace. But --composition-label-selector is tricky since it matches on composition labels, which don't necessarily match resource slices.
The simplest fix is to move the slice cleanup controller from the reconciler to the eno-controller process so it will always have a coherent view of the entire cluster. Historically, the controller/reconciler divide was designed such that the controller doesn't need to watch/cache resource slices, so the downside of this change is added overhead from the new informer. But practically speaking it just isn't an issue (we're talking about tens of MBs of memory worst case).
This change also adds an extra check to the resource slice cleanup controller to avoid any potential issues caused by stale informers: re-run the cleanup logic against a non-cached version of the composition. Since resource slices are so important to Eno, added safety justifies the overhead of a few get requests to apiserver.
eno-reconciler processes that use selectors to restrict compositions they are responsible for reconciling will happily delete all resource slices not matched by the selector - not a great failure mode lol.
The fix for
--composition-namespace
is easy - only watch resource slices in the composition's namespace. But--composition-label-selector
is tricky since it matches on composition labels, which don't necessarily match resource slices.The simplest fix is to move the slice cleanup controller from the reconciler to the eno-controller process so it will always have a coherent view of the entire cluster. Historically, the controller/reconciler divide was designed such that the controller doesn't need to watch/cache resource slices, so the downside of this change is added overhead from the new informer. But practically speaking it just isn't an issue (we're talking about tens of MBs of memory worst case).
This change also adds an extra check to the resource slice cleanup controller to avoid any potential issues caused by stale informers: re-run the cleanup logic against a non-cached version of the composition. Since resource slices are so important to Eno, added safety justifies the overhead of a few get requests to apiserver.