KatharaFramework / Kathara

A lightweight container-based network emulation system.
https://www.kathara.org/
GNU General Public License v3.0
462 stars 64 forks source link

Kubernetes startup watch may never terminate if there is a Pod error #258

Closed Skazza94 closed 10 months ago

Skazza94 commented 11 months ago

The current implementation of the KubernetesMachine._wait_machines_startup method continuously loops on watch events from list_namespaced_pod. In specific cases, such as critical Pod errors (like CNI errors), no further events are generated.

Consequently, the for loop runs forever, causing the program to hang indefinitely.

To resolve this issue, it is necessary to introduce a mechanism that breaks the loop after a defined threshold. Our approach involves utilizing threading.Timer to establish a 3-minute timer. This timer will be reset upon receiving each new event. However, if no events occur within the 3-minute interval, the callback will be triggered, signaling an error and terminating the program.