Open mayurbhagia opened 9 months ago
I think it's a timing issue. if you try to run terraform apply
or rerun install.sh
again then it should fix the issue.
Please feel free to update troubleshooting guide https://github.com/awslabs/data-on-eks/blob/main/website/docs/blueprints/troubleshooting/troubleshooting.md if the issue resolved by the above approach.
This consistently requires two executions of install.sh currently.
I'm betting we need to bump up that 10s timeout, but currently we'd be blocked on... https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2711
This is due to a mutating webhook introduced for LBC v2.5+. Per the docs...
The AWS LBC provides a mutating webhook for service resources to set the spec.loadBalancerClass field for service of type LoadBalancer on create. This makes the AWS LBC the default controller for service of type LoadBalancer. You can disable this feature and revert to set Cloud Controller Manager (in-tree controller) as the default by setting the helm chart value enableServiceMutatorWebhook to false with --set enableServiceMutatorWebhook=false . You will no longer be able to provision new Classic Load Balancer (CLB) from your kubernetes service unless you disable this feature. Existing CLB will continue to work fine.
If you do not need to have the webhook enabled then you can disable it as shown here.
# Turn off mutation webhook for services to avoid ordering issue
enable_aws_load_balancer_controller = true
aws_load_balancer_controller = {
set = [{
name = "enableServiceMutatorWebhook"
value = "false"
}]
}
We should add this to our blueprints, will mark it as a bug for tracking.
Installing Spark Operator with YuniKorn on Cloud9 in my AWS account and install.sh is ending with below two errors:
Error: 2 errors occurred: │ Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": dial tcp 100.64.184.123:9443: connect: connection refused │ Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": dial tcp 100.64.184.123:9443: connect: connection refused