Closed blagerweij closed 2 years ago
Thank you @blagerweij for raising this! We are investigating this issue right now.
As a temporary solution, webhooks can be disabled if it is feasible for your side.
Hi @blagerweij could you try changing the tyk-ce deployment to a deployment rather than a daemonset? Then ensure that the gateway deployment is scaled to 1. The reason is that our open source gateway offering currently handles a single Gateway.
For scaling gateways & HA, we would recommend a paid license, as the Tyk Dashboard control plane is the component which is used to orchestrate APIs across one or more gateway clusters.
Let me know if this solves your problem, or if we need to keep digging into EKS.
We were able to track down the root cause of this issue: in addition to the tyk-operator, we also have a few other controllers running in that same namespace. Since the service for the webhooks uses a generic label (which is also used by two other controllers), the kube-dns resolution for the validating and mutating webhooks find not only the tyk operator, but also the other two controllers. These other controllers don't expose the https target port, so the webhooks fail.
labels:
control-plane: controller-manager
pod-template-hash: 586948b668
And for the service:
selector:
control-plane: controller-manager
We're going to try to run tyk in a separate isolated namespace, to see if that will resolve the issue.
Hi @blagerweij thank you for the update. I'm closing this ticket if it is not an issues anymore. Please let us know if otherwise.
Cheers.
Hi @caroltyk, Are there any plans to improve the tyk-operator with regards to the selector? Currently the selector looks for any service with label 'control-plane: controller-manager'. Any project which has been built with kubebuilder will have that label, so it would be nice to add a tyk-specific label, so that the webhooks work even when the tyk-operator is deployed in the same namespace as another operator. IMHO that would be relatively easy to add, no ?
Hi @blagerweij, that makes sense. Thanks for the suggestion. I'll take it back to the team.
We have deployed tyk-gateway and the tyk-operator on Amazon EKS. However, when adding a new API definition, we get intermittent errors. Sometimes creating the new apidefinition CRD works, but a lot of times we get errors reported by the webhooks:
The error is very intermittent, about 30% of the time it succeeds, and 70% it fails. We have 3 nodes, so I'm suspecting there might be a correlation.
Expected Behavior
Creating an API Definition CRD should succeed
Current Behavior
The webhooks are intermittently failing
Steps to Reproduce
On AWS EKS, installed using the following script:
Your Environment
AWS EKS version v1.21.5-eks-bc4871b cert-manager-v1.7.0