CiscoDevNet / Hyperflex-Hypercheck

Perform pro-active self checks on your Hyperflex cluster to ensure stability and resiliency
MIT License
27 stars 17 forks source link

Fix vmotion ping #31

Open jgc234 opened 4 years ago

jgc234 commented 4 years ago

This is a simple proposal to fix the vmotion ping failure problem. The first few vmotion pings often fail on a cluster node that has been idle. If you run the HXTool.py script again, it works fine. Assuming it takes a moment to populate the ARP cache, the aggressive nature of the timeout and the short count on the ping may cause a packet or two to be dropped.. The parser that checks the ping response expects 0% packet loss, hence it fails the test. Rather than re-write the pingstatus() parser, it's easier to just run the test twice - once with a normal timeout and normal MTU to populate the ARP cache, then run the existing test. That being said, I haven't conclusively proven it is an ARP problem - esxcli network ip neighbor list doesn't show the vmotion interfaces or addresses for me (even in the default stack). The other potential solution is to just remove the excessively low timeout, but I assume there was a reason for this (maybe to prune out long WAN links?)