Open jainpulkit22 opened 3 weeks ago
Duplicate of #5753
Duplicate of #5753
This is a different issue, this issue is in the implementation of context based cleanup of clusters. The issue you have mentioned is already taken care now this issue is a bug in the implementation of the issue pointed out by you. Also this is not related to cleanup of antrea installation it is related to deletion of cluster or basically we can say cleanup of testbed that happens before start of the test.
Describe the bug The CI jobs fail because of panic in the cleanup of existing kind cluster. Because in the current implementation of cleanup function for kind cluster, the code tries to get the creation timestamp of all the available kind clusters using the command
kubectl get nodes --context kind-$kind_cluster_name -o json -l node-role.kubernetes.io/control-plane | \ jq -r '.items[0].metadata.creationTimestamp'
, and sometimes there may be other job running on the same vm that has just started and the cluster creation is in process so the context is not ready but when another job tries to create the cluster it will stuck in this step and that job will panic and fail.Not only in case of parallel job runs, but also if some job is aborted in the cluster creation phase the context of the kind cluster will not be available and whenever any new job will run on this testbed and will run the cleanup function the job will fail because it will try to fetch the context of clusters listed by
kind get clusters
using the above command and will panic causing the job to fail.To Reproduce Trigger two kind jobs at the same time on same vm, or trigger one job and then as soon as the cluster creation starts, abort the job and then trigger a new job on the same testbed the second job will fail because of panic in both the cases.
Expected The jobs should not fail and cluster creation should be successful.
Actual behavior The job fails
Additional context Reference to current implementation of cleanup function: clean_kind
5753