This change allows the resource-group-controller-manager to use cached resource group records when the records are stale due to unavailable APIService backends, like metrics-server, which often has only one replica.
All other discovery errors will now also cause the TypeResolver controller to retry discovery, with logging and retry backoff handled by the controller manager.
If discovery fails when the TypeResolver is first created, it will cause the process to error and exit. Then the container runtime should restart the container.
This should prevent ResourceGroup status from marking all resources as NotFound when discovery fails.
Fixes: b/379943950
Note: This only fixes the problem in the resource-group-controller-manager, and only when Aggregated Discovery is enabled, which is what handles server-side caching of the resource groups. We have other similar but different workarounds in place in the reconciler, which uses slightly methods on the DiscoveryClient and RESTMapper.
Needs approval from an approver in each of these files:
- ~~[OWNERS](https://github.com/GoogleContainerTools/kpt-config-sync/blob/main/OWNERS)~~ [tiffanny29631]
Approvers can indicate their approval by writing `/approve` in a comment
Approvers can cancel approval by writing `/approve cancel` in a comment
Fixes: b/379943950
Note: This only fixes the problem in the resource-group-controller-manager, and only when Aggregated Discovery is enabled, which is what handles server-side caching of the resource groups. We have other similar but different workarounds in place in the reconciler, which uses slightly methods on the DiscoveryClient and RESTMapper.