Azure / azure-service-operator

Azure Service Operator allows you to create Azure resources using kubectl
https://azure.github.io/azure-service-operator/
MIT License
738 stars 194 forks source link

Bug: `none` crd-management mode doesn't work with only a subset of CRDs like it should #4146

Closed matthchr closed 2 months ago

matthchr commented 3 months ago

Version of Azure Service Operator v2.8.0 (and previous versions)

Describe the bug When none is set for crd-management mode, the following happens with a subset of CRDs:

I0702 12:53:04.049802       1 main.go:29] "msg"="Launching with flags" "flags"="MetricsAddr: :8080, HealthAddr: :8081, WebhookPort: 9443, WebhookCertDir: /tmp/k8s-webhook-server/serving-certs, EnableLeaderElection: true, CRDManagementMode: none, CRDPatterns: , PreUpgradeCheck: false" "logger"="setup"
I0702 12:53:04.111356       1 setup.go:273] "msg"="No global credential configured, continuing without default global credential." "logger"="controllers"
I0702 12:53:04.410101       1 manager.go:75] "msg"="Found an existing CRD" "CRD"="managedclusters.containerservice.azure.com" "logger"="controllers"
I0702 12:53:04.410127       1 manager.go:75] "msg"="Found an existing CRD" "CRD"="managedclustersagentpools.containerservice.azure.com" "logger"="controllers"
I0702 12:53:04.410133       1 manager.go:75] "msg"="Found an existing CRD" "CRD"="natgateways.network.azure.com" "logger"="controllers"
I0702 12:53:04.410139       1 manager.go:75] "msg"="Found an existing CRD" "CRD"="resourcegroups.resources.azure.com" "logger"="controllers"
I0702 12:53:04.410151       1 setup.go:198] "msg"="CRD management mode was set to 'none', the operator will not manage CRDs and assumes they are already installed and matching the operator version" "logger"="setup"
I0702 12:53:04.436640       1 setup.go:357] "msg"="Configuration details" "config"="SubscriptionID:/TenantID:/ClientID:/podnamespace:giantswarm/OperatorMode:watchers-and-webhooks/TargetNamespaces:/SyncPeriod:1h0m0s/ResourceManagerEndpoint:https://management.azure.com/ResourceManagerAudience:https://management.core.windows.net//AzureAuthorityHost:https://login.microsoftonline.com//UseWorkloadIdentityAuth:false" "logger"="controllers"
I0702 12:53:04.438155       1 register.go:73] "msg"="Registering indexer for type" "key"=".spec.proxy.password" "logger"="controllers" "type"="*storage.Backend"
E0702 12:53:04.440405       1 setup.go:212] "msg"="failed to initialize watchers" "error"="failed to register gvks: failed to register indexer for *storage.Backend, Key: \".spec.proxy.password\": failed to get API group resources: unable to retrieve the complete list of server APIs: apimanagement.azure.com/v1api20220801storage: the server could not find the requested resource" "logger"="setup"

This is because https://github.com/Azure/azure-service-operator/blob/51edbbe2e40f88f5dee65131c2937c447c94f9a9/v2/cmd/controller/app/setup.go#L160-L208

goalCRDs is the set of existing CRDs already installed in the cluster, but when we go to calculate the list of crdmanagement.GetNonReadyCRDs(cfg, crdManager, goalCRDs, existingCRDs) that means only installed CRDs can be found notReady.

This means when we filter out all the notReady CRDs from the set of all CRDs in the scheme, we fail to exclude uninstalled CRDs like we should: https://github.com/Azure/azure-service-operator/blob/51edbbe2e40f88f5dee65131c2937c447c94f9a9/v2/cmd/controller/app/setup.go#L391-L406

Expected behavior none mode shouldn't assume every CRD is installed, it should work with a subset of CRDs just like auto mode.