Deleting an AppInst on an Azure-based cloudlet is reporting a failure, even though the AppInst is actually removed correctly from the cluster. There is some issue with the pod state check when we are waiting for the pod to be removed:
The workaround is to delete the AppInst again, which will properly detect the pods are not present, and succeed.
Here are some of the error messages, and logs:
2024-11-26T06:36:50.322Z INFO 4318ce34abc0fba2 k8smgmt/appinst.go:101 pod is running {"podName": "us-jon-test-k8s10jondevorg10-deployment-5fdbd5b967-vfm9b"}
2024-11-26T06:36:51.322Z INFO 4318ce34abc0fba2 k8smgmt/appinst.go:64 check pods status {"namespace": "default", "selector": "mex-app=us-jon-test-k8s10jondevorg10-deployment"}
2024-11-26T06:36:52.128Z INFO 4318ce34abc0fba2 crmutil/controller-data.go:577 can't delete app inst {"error": "Delete App Inst failed: Run container failed, pod state: Failed - Name:
rpc error: code = Unknown desc = Delete App Inst failed: DELETE
https://console.cloud.edgexr.org/operatorplatform/federation/v1/63719bab-ce31-4e22-90d6-7d10bac92352/application/lcm/app/us-jon-test-k8s10jondevorg/instance/fedtest/zone/us-azure-westus
failed: Delete App Inst failed: Run container failed, pod state: Failed - No
resources found in default namespace.
rpc error: code = Unknown desc = Delete App Inst failed: DELETE
https://console.cloud.edgexr.org/operatorplatform/federation/v1/269df0a6-6287-4bbc-8d74-26cae09ea268/application/lcm/app/us-jon-test-k8s10jondevorg/instance/fedtestinst/zone/us-azure-westus
failed: Delete App Inst failed: Run container failed, pod state: Failed -
Name: us-jon-test-k8s10jondevorg10-deployment-5fdbd5b967-vfm9b
Namespace: default
Priority: 0
Service Account: default
Node: aks-agentpool-64407650-vmss000000/10.224.0.4
Start Time: Tue, 26 Nov 2024 06:28:23 +0000
Labels: mex-app=us-jon-test-k8s10jondevorg10-deployment
mexAppInstName=fedtestinst
mexAppInstOrg=hostfed
mexDeployGen=kubernetes-basic
pod-template-hash=5fdbd5b967
run=us-jon-test-k8s10jondevorg1.0
Annotations: <none>
Status: Terminating (lasts 1s)
Termination Grace Period: 30s
IP: 10.244.0.11
IPs:
IP: 10.244.0.11
Controlled By: ReplicaSet/us-jon-test-k8s10jondevorg10-deployment-5fdbd5b967
Containers:
us-jon-test-k8s10jondevorg10:
Container ID: containerd://d9d09fd3fa61ee179496679a522b9f8df6322c3d694cb37fe9e1e0c7924d0f0f
Image: docker.io/hashicorp/http-echo:0.2.3
Image ID: docker.io/hashicorp/http-echo@sha256:ba27d460cd1f22a1a4331bdf74f4fccbc025552357e8a3249c40ae216275de96
Port: 5678/TCP
Host Port: 0/TCP
Args:
-text="hello to edgexr"
State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 26 Nov 2024 06:28:25 +0000
Finished: Tue, 26 Nov 2024 06:36:51 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nrlgr (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-nrlgr:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists
for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m30s default-scheduler Successfully assigned default/us-jon-test-k8s10jondevorg10-deployment-5fdbd5b967-vfm9b to aks-agentpool-64407650-vmss000000
Normal Pulling 8m29s kubelet Pulling image "docker.io/hashicorp/http-echo:0.2.3"
Normal Pulled 8m27s kubelet Successfully pulled image "docker.io/hashicorp/http-echo:0.2.3" in 1.813s (1.813s including waiting)
Normal Created 8m27s kubelet Created container us-jon-test-k8s10jondevorg10
Normal Started 8m27s kubelet Started container us-jon-test-k8s10jondevorg10
Warning FailedToRetrieveImagePullSecret 38s (x9 over 8m29s) kubelet Unable to retrieve some image pull secrets (docker.io); attempting to pull the image may not succeed.
Normal Killing 31s kubelet Stopping container us-jon-test-k8s10jondevorg10
Deleting an AppInst on an Azure-based cloudlet is reporting a failure, even though the AppInst is actually removed correctly from the cluster. There is some issue with the pod state check when we are waiting for the pod to be removed:
The workaround is to delete the AppInst again, which will properly detect the pods are not present, and succeed.
Here are some of the error messages, and logs: