Based on recent analysis on a larger sensor fleet it still is possible that in certain situations (apparently during removal of certain interfaces / GRE tunnels in conjunction with other actions running in parallel, e.g. a rotation) it's still possible for individual interfaces to trigger a three-point-lock timeout. Unfortunately these situations are not ideally handled yet so that it is possible that such cases lead to a complete deadlock of the CaptureManager. The following mitigation items are to be added:
[x] Ensure that all three-point lock failures lead to immediate closure of the respective interface (and that it is brought back up during the next scheduled config re-assessment)
[x] Decouple global CaptureManager lock used for interface Close() functionality from individual interfaces (minimize risk of global deadlock)
Based on recent analysis on a larger sensor fleet it still is possible that in certain situations (apparently during removal of certain interfaces / GRE tunnels in conjunction with other actions running in parallel, e.g. a rotation) it's still possible for individual interfaces to trigger a three-point-lock timeout. Unfortunately these situations are not ideally handled yet so that it is possible that such cases lead to a complete deadlock of the
CaptureManager
. The following mitigation items are to be added:CaptureManager
lock used for interfaceClose()
functionality from individual interfaces (minimize risk of global deadlock)