When disks die, or get removed, the next time a computer boots, the disks will get a different kernel name (sda, ...)
When this happens ASDs fail to start and trigger the following HEALTH CHECK errors:
When this HC is in error, we need to check if this is caused by a disk devicename change:
1. Get a list of backends with ASD's in error:
2. Get the ASD guid to restart:
3. Restart the ASD
To get the node guid for restarting the ASD you need list the nodes (https://ovs-be-g8-4.gig.tech/api/alba/nodes/?sort=ip&contents=node_id%2C_relations&discover=false×tamp=1539180202498) and get its guid via looking it up using the disk guid
4. Analyze the result of restarting the ASD
The previous call provided a task guid as a response. Using this task guid, poll for the result with the following call: https://ovs-be-g8-4.gig.tech/api/tasks/cd1e6539-6ee2-42df-9a71-28c50836159c/?timestamp=1539182979530
If the response contains UNIQUE constraint failed: disk.name like in the response below, then we should run the healing code in step 5
5. Heal the ASDs with the following piece of python
from source.dal.lists.disklist import DiskList
disks = DiskList.get_disks()
for d in disks:
d.name = '{}_new'.format(d.name)
d.save()
6. Restart the asd-manager
systemctl restart asd-manager
7. Retrigger the healthcheck
Make sure though that we do not go in an endless loop.
When disks die, or get removed, the next time a computer boots, the disks will get a different kernel name (sda, ...)
When this happens ASDs fail to start and trigger the following HEALTH CHECK errors:![image](https://user-images.githubusercontent.com/15246294/46738166-c7decf80-cc9d-11e8-917b-f141ac564dc1.png)
When this HC is in error, we need to check if this is caused by a disk devicename change:
1. Get a list of backends with ASD's in error:![image](https://user-images.githubusercontent.com/15246294/46740229-9a485500-cca2-11e8-9ba0-fb57ad2b5bcc.png)
2. Get the ASD guid to restart:![image](https://user-images.githubusercontent.com/15246294/46742006-78e96800-cca6-11e8-8471-ee378385e286.png)
3. Restart the ASD![image](https://user-images.githubusercontent.com/15246294/46742146-b948e600-cca6-11e8-95de-93d6ac668b70.png)
4. Analyze the result of restarting the ASD The previous call provided a task guid as a response. Using this task guid, poll for the result with the following call:
https://ovs-be-g8-4.gig.tech/api/tasks/cd1e6539-6ee2-42df-9a71-28c50836159c/?timestamp=1539182979530
If the response contains![image](https://user-images.githubusercontent.com/15246294/46745397-3f682b00-ccad-11e8-98cc-71d342f03603.png)
UNIQUE constraint failed: disk.name
like in the response below, then we should run the healing code in step 55. Heal the ASDs with the following piece of python
6. Restart the asd-manager systemctl restart asd-manager
7. Retrigger the healthcheck Make sure though that we do not go in an endless loop.