Open maxdrib opened 2 years ago
discussing with @danbudris, it might make sense to introduce some validation on the webhook server service endpoints such as https://eksa-webhook-service.eksa-system.svc:443/validate-anywhere-eks-amazonaws-com-v1alpha1-cloudstackdatacenterconfig?timeout=10s
There appears to already be retry logic in place https://github.com/aws/eks-anywhere/blob/6104510c396ae57863b43a497f5c19c2293b8173/pkg/gitops/flux/client.go#L60-L65 so it's unclear why this operation might be failing. Next steps would include the following:
What happened: I’ve seen the CloudStackLegacyFlux e2e test fail a number of times now (randomly) with the error
which indicates to me that we are installing gitops before some eksa-webhooks are available. We should wait to install gitops until after the eksa pod is ready so that the webhooks are available.
What you expected to happen: I would expect this operation to succeed without a connection refused error
How to reproduce it (as minimally and precisely as possible): Run the
TestCloudStackUpgradeMulticlusterWorkloadClusterWithFluxLegacy
a number of timesAnything else we need to know?:
Environment: