:lady_beetle: Bug: update_status_check.yml is not surviving upgrade node reboot

Describe the bug

my playbook for multi node cluster updates is using this task to re-use the status check from the version_update_single_node role

name: use update check from sns update role #has inner and outer retry loops ansible.builtin.import_role: name: scale_computing.hypercore.version_update_single_node tasks_from: update_status_check.yml

but it appears it's not surviving the time when node is actually down - no update response at all during a reboot - do I need ignore_unreachable: true in above task or some kind of retry there? or should rule be handling this? (note upgrade is still running when error below is thrown)

TASK [hypercore_version : apply desired version to cluster or SNS] *** changed: [veb120a-01.lab.local] Friday 18 August 2023 07:09:42 -0400 (0:00:04.309) 0:00:25.398 *

TASK [scale_computing.hypercore.version_update_single_node : Increment version_update_single_node_retry_count] *** ok: [veb120a-01.lab.local] Friday 18 August 2023 07:09:42 -0400 (0:00:00.063) 0:00:25.462 *

TASK [scale_computing.hypercore.version_update_single_node : Pause before checking update status - checks will report FAILED-RETRYING until update COMPLETE/TERMINATED] * ok: [veb120a-01.lab.local -> localhost] Friday 18 August 2023 07:10:43 -0400 (0:01:00.866) 0:01:26.329 *** FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (100 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (99 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (98 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (97 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (96 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (95 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (94 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (93 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (92 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (91 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (90 retries left). FAILED - RETRYING: [veb120a-01.lab.local]: Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED (89 retries left).

TASK [scale_computing.hypercore.version_update_single_node : Check update status - will report FAILED-RETRYING until update COMPLETE/TERMINATED] ***** fatal: [veb120a-01.lab.local]: FAILED! => {"msg": "The conditional check 'version_update_single_node_update_status.record != None and (\n version_update_single_node_update_status.record.update_status == \"COMPLETE\" or\n version_update_single_node_update_status.record.update_status == \"TERMINATING\"\n)' failed. The error was: error while evaluating conditional (version_update_single_node_update_status.record != None and (\n version_update_single_node_update_status.record.update_status == \"COMPLETE\" or\n version_update_single_node_update_status.record.update_status == \"TERMINATING\"\n)): 'dict object' has no attribute 'record'"} ...ignoring

PLAY RECAP *** veb120a-01.lab.local : ok=13 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1

To Reproduce calling this role https://github.com/ddemlow/ansible_edge_playbooks/blob/master/roles/hypercore_version/tasks/main.yml

Expected behavior

update monitoring should continue through entire cluster update even when node reboots

Screenshots

If applicable, add screenshots to help explain your problem.

System Info (please complete the following information):

OS: [e.g. iOS]
HyperCore Version: 9.2
Ansible Version:
Collection Version current main

Additional context

Add any other context about the problem here.

ScaleComputing / HyperCoreAnsibleCollection