Azure / ARO-RP

Azure Red Hat OpenShift RP
https://azure.microsoft.com/products/openshift/
Apache License 2.0
98 stars 168 forks source link

Install failed. API server failure #610

Open mjudeikis opened 4 years ago

mjudeikis commented 4 years ago
 {
                        "type": "Degraded",
                        "status": "True",
                        "lastTransitionTime": "2020-04-26T22:37:03Z",
                        "reason": "InstallerPodContainerWaiting_CreateContainerError",
                        "message": "InstallerPodContainerWaitingDegraded: Pod \"installer-5-v4-e2e-v9146067c-5nlrm-master-1\" on node \"v4-e2e-v9146067c-5nlrm-master-1\" container \"installer\" is waiting for 29m21.297301748s because \"the container name \\\"k8s_installer_installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver_bf18795f-ee77-40ab-9bf1-03538175f319_0\\\" is already in use by \\\"53aad5c980be1c263350bd4a1e57c36b6fae745958a492a748bc887549181972\\\". You have to remove that container to be able to reuse that name.: that name is already in use\""
}

https://jarvis-west.dc.ad.msft.net/CE5E7269

olga-mir commented 4 years ago

/assign

olga-mir commented 4 years ago

I believe that this is the root cause: https://github.com/cri-o/cri-o/commit/e785dd2fdcc9531204436d81442f0b55213ed5fc

This log (https://jarvis-west.dc.ad.msft.net/2B7C673D) shows when we first time see "name is reserved" error for k8s_installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver_bf18795f-ee77-40ab-9bf1-03538175f319_0

UTC 04-27-2020 08:33:55: E0426 22:33:55.441694 1438 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver(bf18795f-ee77-40ab-9bf1-03538175f319)" failed: rpc error: code = Unknown desc = error reserving pod name k8s_installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver_bf18795f-ee77-40ab-9bf1-03538175f319_0 for id b8cce0cc58ca6c075fa07c15d57a7c31e0d0cbb705ca8aa78ecbb1f167195801: name is reserved

11 seconds before this error there is CreatePodSandboxError due to "context deadline exceeded", log: https://jarvis-west.dc.ad.msft.net/C1CA7B02

E0426 22:33:42.495083 1438 pod_workers.go:191] Error syncing pod bf18795f-ee77-40ab-9bf1-03538175f319 ("installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver(bf18795f-ee77-40ab-9bf1-03538175f319)"), skipping: failed to "CreatePodSandbox" for "installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver(bf18795f-ee77-40ab-9bf1-03538175f319)" with CreatePodSandboxError: "CreatePodSandbox for pod \"installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver(bf18795f-ee77-40ab-9bf1-03538175f319)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"

Notice that these logs don't have the exact same name as in the message in the description of this issue. I have noticed that this pod is referenced in logs with slightly different names, but the id and node name make me think that this is the same entity and the discrepancy in the naming could be because different components may construct it a bit differently (e.g kubelet, cri-o)

--- in the error statement, in this issue description:
k8s_installer_installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver_bf18795f-ee77-40ab-9bf1-03538175f319_0
--- RunPodSandbox from runtime service failed ("name is reserved") [remote_runtime.go]:
          k8s_installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver_bf18795f-ee77-40ab-9bf1-03538175f319_0
--- crio alias:
      k8s_POD_installer-5-v4-e2e-v9146067c-5nlrm-master-1_openshift-kube-apiserver_bf18795f-ee77-40ab-9bf1-03538175f319_0