Open karniemi opened 1 year ago
Is "vm.summary.runtime.dasVmProtection" somehow re-evaluated for each access? That might python-wise explain how the it's possible to get this error. Though, it would not yet explain why would vCenter sometimes return a different value for the object.
I suppose this might explain the problem:
The latest occurrence was when a VM was being absented using vmware_guest, using a task like this:
- name: delete the VM
vmware_guest:
hostname: "{{ vcenter.hostname }}"
username: "{{ vcenter.username }}"
password: "{{ vcenter.password }}"
validate_certs: False
datacenter: "{{ vcenter_datacenter }}"
name: # workaround for ansible/ansible:#32901
uuid: "{{ result.instance.hw_product_uuid }}"
state: absent
force: yes #ie. poweroff before delete..github:ansible/ansible:#37000
According to the post in that link, the availability of dasVmProtection depends on the power status of the VM. We are using "force" for vmware_guest-module to do the power-off when deleting VMs. At least, it's a good hypothesis that randomly due to powering off, the if-statement might see dasVmProtection, but then inside the if-block ( if dasVmProtection is re-evaluated on each access), the value is None.
I just tested the hypothesis above by running vmware_guest_facts in a loop like this:
while true;do ansible-playbook -i mylab vcenter_vm_facts.yml |egrep "hw_guest_ha_state|hw_power_status";sleep .5;done
...and then powered off the vm via vcenter.
"hw_guest_ha_state": true, "hw_power_status": "poweredOn",
"hw_guest_ha_state": true, "hw_power_status": "poweredOff",
"hw_guest_ha_state": true, "hw_power_status": "poweredOff",
"hw_guest_ha_state": true, "hw_power_status": "poweredOff",
"hw_guest_ha_state": true, "hw_power_status": "poweredOff",
"hw_guest_ha_state": true, "hw_power_status": "poweredOff",
"hw_guest_ha_state": true, "hw_power_status": "poweredOff",
"hw_guest_ha_state": true, "hw_power_status": "poweredOff",
"hw_guest_ha_state": null, "hw_power_status": "poweredOff",
"hw_guest_ha_state": null, "hw_power_status": "poweredOff",
So: when the hw_power_status goes to poweredOff, the hw_guest_ha_state is still available via the API for a long time. And only after a pretty long time hw_guest_ha_state turns null. I'l like to see this as a prove for the hypothesis: the dasVmProtection turns unavailable at some random time after power-off, and causes the sporadic error for those two lines of code mentioned earlier.
@karniemi Yes, your right. VirtualMachineRuntimeInfo
data object of VirtualMachine
managed object contain object dasVmProtection [vim.vm.RuntimeInfo.DasProtectionState]
(which is turn has boolean property dasProtected
). Powered Off VMs in RuntimeInfo contained nulled (Unset) data object dasVmProtection
, and when accessing a property of a non-existent object, the script throws an error.
I could do a check on the object type for this field, but I'm afraid this fix will go to the main branch, and will not be available for version 2 of the collection.
@mariolenz We can add fix for 2.x branch of this collection or not?
Leaning back a bit. I think this specific issue brings up to greater issues:
SUMMARY
ansible modules sporadically fail for:
AttributeError: 'NoneType' object has no attribute 'dasProtected'
The problem happens every few months in our continuous integration builds. We are running tens of builds per day, and each of the builds is executing ansible modules tens/hundreds of times.Full stack trace:
We are still running ansible-2.9.27-1.el7, but the piece of code which causes the problem is still the same: https://github.com/ansible-collections/community.vmware/blob/9f06033bd87611d2d97323ae26dc2eb16c4064bb/plugins/module_utils/vmware.py#L485-L486
those two code lines should never be able to result in this error? The if-statement should skip executing the block, if dasVmProtection=None ... but still, sometimes it gets executed and then it fails.
ISSUE TYPE
COMPONENT NAME
module_utils/vmware.py
ANSIBLE VERSION
COLLECTION VERSION
CONFIGURATION
OS / ENVIRONMENT
vCenter 7 python2-pyvmomi-7.0.1-2.el7
STEPS TO REPRODUCE
The problem happens every few months in our continuous integration builds. We are running tens of builds per day, and each of the builds is executing ansible modules tens/hundreds of times. We have no exact steps to reproduce because the problem is sporadic and happens infrequently.
EXPECTED RESULTS
vmware modules should get facts successfully from vmware.py:gather_vm_facts()
ACTUAL RESULTS
vmware.py:gather_vm_facts() sometimes fails for AttributeError: 'NoneType' object has no attribute 'dasProtected', even though looking at the code this should not be even possible.