CiscoDevNet / Hyperflex-Hypercheck

Perform pro-active self checks on your Hyperflex cluster to ensure stability and resiliency
MIT License
27 stars 18 forks source link

add arping check to verify that proxy arp is disabled on default gateway #33

Open gemirand opened 4 years ago

gemirand commented 4 years ago

proxy arp can cause upgrade to fail

Example of symptoms:

HX Connect Upgrade error "Upgrade is not supported when cluster is unhealthy or cluster cannot tolerate another node failure." CLI "stcli cluster storage-summary --detail" would show issue like:

storage cluster manager is not configured on x.x.x.x /var/log/springpath/exhibitor.log will show zk going down /var/log/zookeeper/zookeeper.log will see timeout and error "caught end of stream exception EndOfStreamException: Unable to read additional data from client"

* Steps to check : Example SCVM1 : eth0 10.10.10.101, eth1 192.168.10.101 Example SCVM2 : eth0 10.10.10.102, eth1 192.168.10.102

a) ssh into one of the SCVM, say SCVM#1 eth0 10.10.10.1 b) issue command sourced from mgmt with target ip of storage of other SCVM (say SCVM #2) node such as: arping -I eth0 192.168.10.102

--> Example output that is a PROBLEM: ARPING 192.168.10.102 from 10.10.10.101 eth0 Unicast reply from 192.168.10.102 [xx:xx:xx:xx:xx:xx] #.##ms Unicast reply from 192.168.10.102 [xx:xx:xx:xx:xx:xx] #.##ms Unicast reply from 192.168.10.102 [xx:xx:xx:xx:xx:xx] #.##ms Unicast reply from 192.168.10.102 [xx:xx:xx:xx:xx:xx] #.##ms Sent 4 probes (1 broadcast(s)) Received 4 response(s)

--> Example output that is OKAY: ARPING 192.168.10.102 from 10.10.10.101 eth0 Sent 12 probes (12 broadcast(s)) Received 0 response(s)

If get a mac address in response it's likely that it's from the default gateway device which has proxy arp enabled