foundation-model-stack / multi-nic-cni

https://foundation-model-stack.github.io/multi-nic-cni/
Apache License 2.0
33 stars 5 forks source link

need healthz check logic to detect failure #127

Open sunya-ch opened 1 year ago

sunya-ch commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Need to add logic to detect failure that cannot be recovered in healthz handler.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

sunya-ch commented 1 year ago

Need to at least wait continuously for two reconcile loops of hostinterface (2*vars.UrgentReconcileTime) after detect failure to make sure that it will not be recovered.

healthz period = 20s

Failure condition: