chaosblade-io / chaosblade-operator

chaosblade operator for kubernetes experiments
Apache License 2.0
163 stars 101 forks source link

cant install chaosblade-operator-1.5.0 #135

Open YUJIATAOOOO opened 2 years ago

YUJIATAOOOO commented 2 years ago

Issue Description

Type: bug

Describe what happened (or what feature you want)

I've installed chaosblade-operator v1.5.0 once before, and uninstalled it, by using helm del chaosblade-operator -n kube-system kubectl delete deployment chaosblade-operator -n kube-system kubectl delete crd chaosblades.chaosblade.io When i trying to install chaosblade-operator v1.5.0 again, Could not install successfully. i've tried it in 2 environments, both are the same error. kubectl get deploy chaosblade-operator -n kube-system -o json

result

"status": { "conditions": [ { "lastTransitionTime": "2022-02-02T06:25:28Z", "lastUpdateTime": "2022-02-02T06:25:28Z", "message": "Created new replica set \"chaosblade-operator-5ccf675967\"", "reason": "NewReplicaSetCreated", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2022-02-02T06:25:28Z", "lastUpdateTime": "2022-02-02T06:25:28Z", "message": "Deployment does not have minimum availability.", "reason": "MinimumReplicasUnavailable", "status": "False", "type": "Available" }, { "lastTransitionTime": "2022-02-02T06:25:29Z", "lastUpdateTime": "2022-02-02T06:25:29Z", "message": "Internal error occurred: failed calling webhook \"chaosblade-operator.kube-system.svc\": failed to call webhook: Post \"https://chaosblade-webhook-server.kube-system.svc:443/mutating-pods?timeout=10s\": dial tcp 10.101.171.120:443: connect: connection refused", "reason": "FailedCreate", "status": "True", "type": "ReplicaFailure" } ], "observedGeneration": 1, "unavailableReplicas": 1 }

kubectl describe replicaset chaosblade-operator-67779995db -n kube-system

result

Name: chaosblade-operator-6db889f86 Namespace: kube-system Selector: name=chaosblade-operator,pod-template-hash=6db889f86 Labels: name=chaosblade-operator part-of=chaosblade pod-template-hash=6db889f86 Annotations: deployment.kubernetes.io/desired-replicas: 1 deployment.kubernetes.io/max-replicas: 2 deployment.kubernetes.io/revision: 1 meta.helm.sh/release-name: chaosblade-operator meta.helm.sh/release-namespace: kube-system Controlled By: Deployment/chaosblade-operator Replicas: 0 current / 1 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: name=chaosblade-operator part-of=chaosblade pod-template-hash=6db889f86 Service Account: chaosblade Containers: chaosblade-operator: Image: chaosbladeio/chaosblade-operator:1.5.0 Port: 9443/TCP Host Port: 0/TCP Command: chaosblade-operator Args: --chaosblade-image-repository=chaosbladeio/chaosblade-tool --chaosblade-version=1.5.0 --chaosblade-image-pull-policy=IfNotPresent --log-level=info --webhook-enable --daemonset-enable --remove-blade-interval=72h --chaosblade-namespace=kube-system Environment: WATCH_NAMESPACE: POD_NAME: (v1:metadata.name) OPERATOR_NAME: chaosblade-operator Mounts: /tmp/k8s-webhook-server/serving-certs from cert (ro) Volumes: cert: Type: Secret (a volume populated by a Secret) SecretName: chaosblade-webhook-server-cert Optional: false Conditions: Type Status Reason ReplicaFailure True FailedCreate Events: Type Reason Age From Message Warning FailedCreate 19s (x14 over 60s) replicaset-controller Error creating: Internal error occurred: failed calling webhook "chaosblade-operator.kube-system.svc": failed to call webhook: Post "https://chaosblade-webhook-server.kube-system.svc:443/mutating-pods?timeout=10s": dial tcp 10.104.96.126:443: connect: connection refused

And it also make all the k8s yamls cannot to be submitted, unless i uninstall chaosblade-operator-1.5.0.

Describe what you expected to happen

install chaosblade-operator unsuccessfully

How to reproduce it (as minimally and precisely as possible)

reinstall chaosblade-operator v1.5.0 helm --debug install chaosblade-operator chaosblade-operator-1.5.0.tgz --namespace kube-system --set webhook.enable=true

Tell us your environment

  1. Kubernetes v1.22.2
  2. helm v3.7.0
  3. plantform: linux/amd64

Anything else we need to know?

I don't know if it's just me having this problem or someone else has, I used chaosblade-operator v1.2.0 before, k8s is version v1.21.0, and I have not encountered similar problems before.

ZXYxc commented 2 years ago

I met it either, do you resolve it ?

there is nor pod and images in my cluster, the event is normal. when i do : kubectl get deploy chaosblade-operator -n chaosblade -o json it said: { "apiVersion": "apps/v1", "kind": "Deployment", "metadata": { "annotations": { "deployment.kubernetes.io/revision": "1", "meta.helm.sh/release-name": "chaosblade-operator", "meta.helm.sh/release-namespace": "chaosblade" }, "creationTimestamp": "2022-02-10T02:26:43Z", "generation": 1, "labels": { "app.kubernetes.io/managed-by": "Helm" }, "name": "chaosblade-operator", "namespace": "chaosblade", "resourceVersion": "18542", "uid": "d7b7d5bf-c8ca-4f5c-8b2a-7705864fb0d8" }, "spec": { "progressDeadlineSeconds": 600, "replicas": 1, "revisionHistoryLimit": 10, "selector": { "matchLabels": { "name": "chaosblade-operator" } }, "strategy": { "rollingUpdate": { "maxSurge": "25%", "maxUnavailable": "25%" }, "type": "RollingUpdate" }, "template": { "metadata": { "creationTimestamp": null, "labels": { "name": "chaosblade-operator", "part-of": "chaosblade" } }, "spec": { "containers": [ { "args": [ "--chaosblade-image-repository=chaosbladeio/chaosblade-tool", "--chaosblade-version=1.5.0", "--chaosblade-image-pull-policy=IfNotPresent", "--log-level=info", "--webhook-enable", "--daemonset-enable", "--remove-blade-interval=72h", "--chaosblade-namespace=chaosblade" ], "command": [ "chaosblade-operator" ], "env": [ { "name": "WATCH_NAMESPACE" }, { "name": "POD_NAME", "valueFrom": { "fieldRef": { "apiVersion": "v1", "fieldPath": "metadata.name" } } }, { "name": "OPERATOR_NAME", "value": "chaosblade-operator" } ], "image": "chaosbladeio/chaosblade-operator:1.5.0", "imagePullPolicy": "IfNotPresent", "name": "chaosblade-operator", "ports": [ { "containerPort": 9443, "protocol": "TCP" } ], "resources": {}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [ { "mountPath": "/tmp/k8s-webhook-server/serving-certs", "name": "cert", "readOnly": true } ] } ], "dnsPolicy": "ClusterFirst", "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "chaosblade", "serviceAccountName": "chaosblade", "terminationGracePeriodSeconds": 30, "volumes": [ { "name": "cert", "secret": { "defaultMode": 420, "secretName": "chaosblade-webhook-server-cert" } } ] } } }, "status": { "conditions": [ { "lastTransitionTime": "2022-02-10T02:26:43Z", "lastUpdateTime": "2022-02-10T02:26:43Z", "message": "Deployment does not have minimum availability.", "reason": "MinimumReplicasUnavailable", "status": "False", "type": "Available" }, { "lastTransitionTime": "2022-02-10T02:26:44Z", "lastUpdateTime": "2022-02-10T02:26:44Z", "message": "Internal error occurred: failed calling webhook \"chaosblade-operator.chaosblade.svc\": Post \"https://chaosblade-webhook-server.chaosblade.svc:443/mutating-pods?timeout=10s\": dial tcp 10.1.255.232:443: connect: connection refused", "reason": "FailedCreate", "status": "True", "type": "ReplicaFailure" }, { "lastTransitionTime": "2022-02-10T02:36:44Z", "lastUpdateTime": "2022-02-10T02:36:44Z", "message": "ReplicaSet \"chaosblade-operator-7f7b79fcc5\" has timed out progressing.", "reason": "ProgressDeadlineExceeded", "status": "False", "type": "Progressing" } ], "observedGeneration": 1, "unavailableReplicas": 1 } }

my environment is : kubernetes v1.20.0, helm v3.8.0, plantform: linux/amd64.

actually, I use chaosblade-operator-1.4.0-v3.tgz, it worked.

YUJIATAOOOO commented 2 years ago

maybe the cause root is that the pod was not ready, but had registered the api server. im just guessing. https://imroc.cc/post/201912/kubernetes-no-route-to-host/#%E9%97%AE%E9%A2%98%E5%8F%8D%E9%A6%88