Closed AceHack closed 2 years ago
Hi, I tried to reproduce your error, and I am not sure if I succeeded.
I used your command to set the environment variable.
I killed the node where the operator is running so it got rescheduled to a different one.
I checked the logs and I do see error Unexpected EOF
but only one time, and its coming from leaderelection.go
leaderelection.go:331] error retrieving resource lock kafka/controller-leader-election-helper: Get https://10.10.0.1:443/api/v1/namespaces/kafka/configmaps/controller-leader-election-helper: unexpected EOF
As far as I can tell this error does not affect the operator.
It will try to acquire this lease once again, and for me it succeeded.
successfully acquired lease kafka/controller-leader-election-helper
After restart I also applied multiple KafkaTopic CRs which eventually caused Webhook invocations, but everything is succeeded for me.
Can you please share the whole log from the operator, so we can help you with the investigation.
I'll try and reproduce but it happens for me continuously whenever I try to create or delete topics. They always fail.
I keep getting this error over and over in a loop when trying to create a topic
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster Reconciling {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-clust
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster Reconciled {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-cluste
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster Reconciling {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-clust
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster Reconciled {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-cluste
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster Reconciling {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-clust
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster Reconciled {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-cluste
manager 2020-04-28T01:19:42.718Z DEBUG controllers.KafkaCluster Reconciling {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-clust
manager 2020-04-28T01:19:42.719Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.719Z DEBUG controllers.KafkaCluster Reconciled {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-cluste
manager 2020-04-28T01:19:42.719Z DEBUG controllers.KafkaCluster Reconciling {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-clust
manager 2020-04-28T01:19:42.728Z INFO controllers.KafkaCluster Kafka cluster state updated {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafk
manager 2020-04-28T01:19:42.728Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.728Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.729Z DEBUG controllers.KafkaCluster searching with label because name is empty {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Requ
manager 2020-04-28T01:19:42.729Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.729Z DEBUG controllers.KafkaCluster searching with label because name is empty {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Requ
manager 2020-04-28T01:19:42.729Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.729Z DEBUG controllers.KafkaCluster searching with label because name is empty {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Requ
manager 2020-04-28T01:19:42.729Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.730Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.730Z DEBUG controllers.KafkaCluster searching with label because name is empty {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Requ
manager 2020-04-28T01:19:42.755Z INFO controllers.KafkaCluster Kafka cluster state updated {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafk
manager 2020-04-28T01:19:42.759Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.813Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.813Z DEBUG controllers.KafkaCluster searching with label because name is empty {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Requ
manager 2020-04-28T01:19:42.845Z INFO controllers.KafkaCluster Kafka cluster state updated {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafk
manager 2020-04-28T01:19:42.848Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.905Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:42.905Z DEBUG controllers.KafkaCluster searching with label because name is empty {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Requ
manager 2020-04-28T01:19:42.934Z INFO controllers.KafkaCluster Kafka cluster state updated {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafk
manager 2020-04-28T01:19:42.938Z DEBUG controllers.KafkaCluster resource is in sync {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knati
manager 2020-04-28T01:19:43.031Z DEBUG controllers.KafkaCluster Reconciled {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-cluste
manager 2020-04-28T01:19:43.032Z DEBUG controllers.KafkaCluster Reconciling {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-clust
manager 2020-04-28T01:19:43.051Z INFO controllers.KafkaCluster CR status updated {"Request.Namespace": "2269-kafka-knative/kafka-knative-cluster", "Request.Name": "kafka-knative-
manager 2020-04-28T01:19:43.051Z INFO controllers.KafkaCluster could not create cruise control topic: Internal error occurred: failed calling webhook "kafkatopics.kafka.banzaicloud.
manager 2020-04-28T01:19:43.051Z ERROR controller-runtime.controller Reconciler error {"controller": "KafkaCluster", "request": "2269-kafka-knative/kafka-knative-cluster", "error
manager github.com/go-logr/zapr.(*zapLogger).Error
manager /go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128
manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.2/pkg/internal/controller/controller.go:258
manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.2/pkg/internal/controller/controller.go:232
manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.2/pkg/internal/controller/controller.go:211
manager k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
manager /go/pkg/mod/k8s.io/apimachinery@v0.17.3/pkg/util/wait/wait.go:152
manager k8s.io/apimachinery/pkg/util/wait.JitterUntil
manager /go/pkg/mod/k8s.io/apimachinery@v0.17.3/pkg/util/wait/wait.go:153
manager k8s.io/apimachinery/pkg/util/wait.Until
manager /go/pkg/mod/k8s.io/apimachinery@v0.17.3/pkg/util/wait/wait.go:88
{"level":"error","ts":"2020-11-16T03:29:39.941Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kafka.banzaicloud.io","reconcilerKind":"KafkaCluster","controller":"KafkaCluster","name":"kafka","namespace":"default","error":"could not create cruise control topic: Internal error occurred: failed calling webhook \"kafkatopics.kafka.banzaicloud.io\": Post https://webhook-service.system.svc:443/validate?timeout=30s: service \"webhook-service\" not found","stacktrace":"github.com/go-logr/zapr.(zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller)
@leader-us can you provide the output of the following commands:
kubectl get svc -n kafka
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io -lapp.kubernetes.io/instance=kafka-operator -o yaml
kubectl get pod -n kafka
@leader-us can you provide the output of the following commands:
kubectl get svc -n kafka
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io -lapp.kubernetes.io/instance=kafka-operator -o yaml
kubectl get pod -n kafka
[root@localhost ~]# kubectl get svc -n kafka
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kafka-operator-alertmanager ClusterIP 169.169.247.64
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io -lapp.kubernetes.io/instance=kafka-operator -o yaml apiVersion: v1 items: [] kind: List metadata: resourceVersion: "" selfLink: ""
kubectl get pod -n kafka NAME READY STATUS RESTARTS AGE kafka-operator-controller-manager-7b89fc746f-87n4v 1/1 Running 0 4d20h
I found there is a webhook service in namespace system ,but no related pods in that namespace [root@localhost ~]# kubectl -n system get svc -o yaml apiVersion: v1 items:
this service defined in config/manifests.yaml and service.yaml
apiVersion: admissionregistration.k8s.io/v1beta1 kind: ValidatingWebhookConfiguration metadata: creationTimestamp: null name: validating-webhook-configuration webhooks:
{"level":"error","ts":"2020-11-16T09:02:14.701Z","logger":"setup","msg":"problem running manager","error":"open /etc/webhook/certs/tls.crt: no such file or directory","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nmain.main\n\t/workspace/main.go:178\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
latest error ,can't find cert
[root@localhost ~]# kubectl get all -n system NAME READY STATUS RESTARTS AGE pod/controller-manager-b977f57d5-mzwwq 0/1 CrashLoopBackOff 4 3m57s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/webhook-service ClusterIP 169.169.74.84
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/controller-manager 0/1 1 0 36m
NAME DESIRED CURRENT READY AGE replicaset.apps/controller-manager-b977f57d5 1 1 0 36m
@leader-us you kafka-operator deployment seems to have incorrect config. How did you deploy kafka-operator?
in dir config/overlays/certmanager-enabled I run following comand to install operator kubectrl apply -f -k .
[root@localhost certmanager-enabled]# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE cert-manager cert-manager-9b8969d86-wvhvg 1/1 Running 0 5d1h cert-manager cert-manager-cainjector-8545fdf87c-r8xtm 1/1 Running 0 5d1h cert-manager cert-manager-webhook-8c5db9fb6-8jskt 1/1 Running 0 5d1h default prometheus-operator-86b9f8646b-l2wwh 1/1 Running 0 5d default zk-with-istio-0 1/1 Running 1 5d7h default zk-with-istio-1 0/1 Running 2 3m50s default zk-with-istio-2 1/1 Running 2 5d2h default zookeeper-operator-8fd88c877-rrzhk 1/1 Running 1 5d7h kafka certman-controller-manager-6957f6c9ff-6sh9j 1/1 Running 0 108s kube-system calico-kube-controllers-5487f898d7-vt4mx 1/1 Running 8 221d kube-system calico-node-9pdpj 1/1 Running 7 221d kube-system coredns-68c75b6549-bnmbm 1/1 Running 7 460d
@leader-us I'd suggest to start with a new K8s cluster and use Helm to deploy kafka operator (https://banzaicloud.com/docs/supertubes/kafka-operator/install-kafka-operator/#kafka-operator-helm) as there might be an issue with the kustomize files.
[root@localhost samples]# kubectl apply -f example-topic.yaml Error from server (InternalError): error when creating "example-topic.yaml": Internal error occurred: failed calling webhook "kafkatopics.kafka.banzaicloud.io": Post https://kafka-operator-webhook-service.kafka.svc:443/validate?timeout=30s: x509: certificate is valid for .kafka-headless.kafka.svc.cluster.local, kafka-headless, .kafka-headless, kafka-headless.kafka, not kafka-operator-webhook-service.kafka.svc
@leader-us can you describe the exact steps you followed to deploy kafka-operator using helm ?
### I following your helm install steps , but found error again
{"level":"info","ts":"2020-11-17T02:18:32.635Z","logger":"controllers.KafkaCluster","msg":"could not create cruise control topic: Internal error occurred: failed calling webhook \"kafkatopics.kafka.banzaicloud.io\": Post https://kafka-operator-webhook-service.kafka.svc:443/validate?timeout=30s: service \"kafka-operator-webhook-service\" not found","Request.Namespace":"default/kafka","Request.Name":"kafka"}
install steps helm repo add banzaicloud-stable https://kubernetes-charts.banzaicloud.com/
helm install kafka-operator --namespace=kafka banzaicloud-stable/kafka-operator kubectl create -n kafka -f config/samples/simplekafkacluster.yaml
kubectl create -n kafka -f config/samples/kafkacluster-prometheus.yaml
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager ClusterIP 169.169.61.209
NAME READY STATUS RESTARTS AGE kafka-operator-operator-84c748cb5c-pk8mp 2/2 Running 0 26m prometheus-kafka-prometheus-0 1/2 ImagePullBackOff 0 15m
NAME READY STATUS RESTARTS AGE kafka-0-2z2tm 1/1 Running 0 21m kafka-1-8v6f2 1/1 Running 0 21m kafka-2-frfbg 1/1 Running 0 21m prometheus-operator-86b9f8646b-l2wwh 1/1 Running 0 5d17h zk-with-istio-0 1/1 Running 1 6d zk-with-istio-1 1/1 Running 48 15h zk-with-istio-2 1/1 Running 2 5d19h zookeeper-operator-8fd88c877-rrzhk 1/1 Running 1 6d1h
[2020-11-17 02:37:19,716] WARN [Producer clientId=CruiseControlMetricsReporter] Error while fetching metadata with correlation id 12399 : {__CruiseControlMetrics=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient) [2020-11-17 02:37:19,817] WARN [Producer clientId=CruiseControlMetricsReporter] Error while fetching metadata with correlation id 12400 : {__CruiseControlMetrics=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient) [root@localhost ~]# ^C
I created missing webhook service
apiVersion: v1 kind: Service metadata: name: kafka-operator-webhook-service namespace: kafka spec: ports:
But cert error !!!
{"level":"info","ts":"2020-11-17T02:48:22.129Z","logger":"controllers.KafkaCluster","msg":"could not create cruise control topic: Internal error occurred: failed calling webhook \"kafkatopics.kafka.banzaicloud.io\": Post https://kafka-operator-webhook-service.kafka.svc:443/validate?timeout=30s: x509: certificate is valid for kafka-operator-operator.kafka.svc.cluster.local, kafka-operator-operator.kafka.svc, not kafka-operator-webhook-service.kafka.svc","Request.Namespace":"default/kafka","Request.Name":"kafka"}
configurationState: ConfigInSync gracefulActionState: cruiseControlState: GracefulUpscaleSucceeded errorMessage: CruiseControl not yet ready rackAwarenessState: "" cruiseControlTopicStatus: CruiseControlTopicNotReady
Closing this since it is stale for a while, please reopen if it reoccurs.
Describe the bug There is a known bug in Kubernetes well golang actually that causes a lot of instability with http2 webhooks. See related issue for more details. https://github.com/kubernetes/kubernetes/issues/80313
Steps to reproduce the issue: Install Kafka operator. Reboot node Kafka webhook is running on. It takes EKS about 15-20 minutes to recover and have the ability to use the Kafka webhook again.
Expected behavior Kubernetes/Webhook to recover in a few seconds, not several minutes.
Additional context An easy fix would be to run the following command.
kubectl set env -n kafka deployment/kafka-knative-operator-kafka-operator-operator GODEBUG=http2server=0
This command will disable any go code from using http2 for its server. This works and fixes many other webhooks like istio and knative but when I run this on this operator I start getting errors on every webhook invocation.
The error is
Unexpected EOF
Please update code to allow disabling http2 on webhooks.