0-complexity / openvcloud

OpenvCloud
Other
2 stars 4 forks source link

Need to make this healtcheck [ volumedriver_check.py ] more smart incase a volumedriver actually is down #1348

Closed hossnys closed 6 years ago

hossnys commented 6 years ago

Detailed description

as per our discussion with ovs guys , Jo , and me we found next : qemu-img info of a volume should (and does) work from all volumedrivers not only the active one for the volume, because, when a volumedriver died and you try to connect to his ip:port the qemu info will fail because your command is not aware what the failover volumedrivers are located.

The edge client works as following:

Edge client connect to the volumedriver when the volumedriver becomes inactive or is down the edge client already knows where the other candidates are and will failover to another one.

A new connection to the volumedriver that is down doesn't have this information. He is not able to failover because he never received the extra information of the other nodes.

Relevant stacktraces

here in the file attached stacktrace.txt

Installation information

JumpScale

Core: branch: production (8a77511) 1/23/2018, 3:15:23 PM
Portal: branch: production (ada4a21) 1/30/2018, 7:44:31 PM

OpenvCloud

Core: branch: production (a5c4eee) 2/14/2018, 1:32:38 PM
G8VDC: branch: production (989668b) 12/27/2017, 4:19:04 PM
Selfhealing: branch: production (3bc2f9b) 2/7/2018, 1:58:11 PM
alichaddad commented 6 years ago

Listing of disks in the script needs to be more efficient, no need to list unused/deleted disks.