Closed jrodrigu-canonical closed 2 weeks ago
How do we differentiate between (1) "an instance was shutdown gracefully" and (2) "an instance wasn't supposed to be up but died"? If that's not possible, it's probably better to keep this as is.
(For the latter case, it probably makes sense to keep it as CRITICAL anyway.)
Hi Pon, as discussed in MM, maybe we could retrieve the status of the VM where the port is attached to? (e.g. openstack server show <id>
)
There should be a different value in the status fields that points to the reason why the VM is down (OS-EXT-STS:power_state, OS-EXT-STS:task_state, OS-EXT-STS:vm_state, ...)
Similarly to LP#2021509, when an instance is shut down, its allocated port in OVN will be DOWN, and a CRITICAL alert will be triggered by
check_ports.cfg
(/usr/local/lib/nagios/plugins/check_resources.py port --all
). The situation where an instance is shut down, and its allocated port is DOWN is common in day-to-day business, and should not be considered as CRITICAL. A WARNING alert would probably be more appropriate.This issue differs from the bugfix of LP#2021509, as in that bugfix the port.binding_vif_type must be "unbound":
while in the described situation, binding_vif_type is always set to "ovs", therefore, the port is not skipped and triggers the CRITICAL alert.