ansible-collections / community.vmware

Ansible Collection for VMware
GNU General Public License v3.0
352 stars 336 forks source link

vmware.py: AttributeError: 'NoneType' object has no attribute 'dasProtected' #1820

Open karniemi opened 1 year ago

karniemi commented 1 year ago
SUMMARY

ansible modules sporadically fail for: AttributeError: 'NoneType' object has no attribute 'dasProtected' The problem happens every few months in our continuous integration builds. We are running tens of builds per day, and each of the builds is executing ansible modules tens/hundreds of times.

Full stack trace:

2023-07-14 21:29:50.714  MSG:
2023-07-14 21:29:50.714  
2023-07-14 21:29:50.714  MODULE FAILURE
2023-07-14 21:29:50.714  See stdout/stderr for the exact error
2023-07-14 21:29:50.714  
2023-07-14 21:29:50.714  
2023-07-14 21:29:50.714  MODULE_STDERR:
2023-07-14 21:29:50.714  
2023-07-14 21:29:50.714  Traceback (most recent call last):
2023-07-14 21:29:50.714    File "<stdin>", line 102, in <module>
2023-07-14 21:29:50.714    File "<stdin>", line 94, in _ansiballz_main
2023-07-14 21:29:50.714    File "<stdin>", line 40, in invoke_module
2023-07-14 21:29:50.714    File "/usr/lib64/python2.7/runpy.py", line 176, in run_module
2023-07-14 21:29:50.714      fname, loader, pkg_name)
2023-07-14 21:29:50.714    File "/usr/lib64/python2.7/runpy.py", line 82, in _run_module_code
2023-07-14 21:29:50.714      mod_name, mod_fname, mod_loader, pkg_name)
2023-07-14 21:29:50.714    File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
2023-07-14 21:29:50.714      exec code in run_globals
2023-07-14 21:29:50.714    File "/tmp/ansible_vmware_guest_payload_sGAKeD/ansible_vmware_guest_payload.zip/ansible/modules/cloud/vmware/vmware_guest.py", line 2834, in <module>
2023-07-14 21:29:50.714    File "/tmp/ansible_vmware_guest_payload_sGAKeD/ansible_vmware_guest_payload.zip/ansible/modules/cloud/vmware/vmware_guest.py", line 2776, in main
2023-07-14 21:29:50.714    File "/tmp/ansible_vmware_guest_payload_sGAKeD/ansible_vmware_guest_payload.zip/ansible/module_utils/vmware.py", line 805, in set_vm_power_state
2023-07-14 21:29:50.714    File "/tmp/ansible_vmware_guest_payload_sGAKeD/ansible_vmware_guest_payload.zip/ansible/module_utils/vmware.py", line 324, in gather_vm_facts
2023-07-14 21:29:50.714  AttributeError: 'NoneType' object has no attribute 'dasProtected'

We are still running ansible-2.9.27-1.el7, but the piece of code which causes the problem is still the same: https://github.com/ansible-collections/community.vmware/blob/9f06033bd87611d2d97323ae26dc2eb16c4064bb/plugins/module_utils/vmware.py#L485-L486

those two code lines should never be able to result in this error? The if-statement should skip executing the block, if dasVmProtection=None ... but still, sometimes it gets executed and then it fails.

ISSUE TYPE
COMPONENT NAME

module_utils/vmware.py

ANSIBLE VERSION
[root@01fb50763f87 /]# ansible --version
ansible 2.9.27
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]
COLLECTION VERSION
N/A
CONFIGURATION
ANSIBLE_PIPELINING(env: ANSIBLE_SSH_PIPELINING) = True
ANSIBLE_SSH_CONTROL_PATH_DIR(env: ANSIBLE_SSH_CONTROL_PATH_DIR) = /tmp
ANY_ERRORS_FATAL(env: ANSIBLE_ANY_ERRORS_FATAL) = True
DEFAULT_STDOUT_CALLBACK(env: ANSIBLE_STDOUT_CALLBACK) = debug
DEFAULT_TIMEOUT(env: ANSIBLE_TIMEOUT) = 180
DEFAULT_TRANSPORT(env: ANSIBLE_TRANSPORT) = paramiko
HOST_KEY_CHECKING(env: ANSIBLE_HOST_KEY_CHECKING) = False
OS / ENVIRONMENT

vCenter 7 python2-pyvmomi-7.0.1-2.el7

STEPS TO REPRODUCE

The problem happens every few months in our continuous integration builds. We are running tens of builds per day, and each of the builds is executing ansible modules tens/hundreds of times. We have no exact steps to reproduce because the problem is sporadic and happens infrequently.

EXPECTED RESULTS

vmware modules should get facts successfully from vmware.py:gather_vm_facts()

ACTUAL RESULTS

vmware.py:gather_vm_facts() sometimes fails for AttributeError: 'NoneType' object has no attribute 'dasProtected', even though looking at the code this should not be even possible.

karniemi commented 1 year ago

Is "vm.summary.runtime.dasVmProtection" somehow re-evaluated for each access? That might python-wise explain how the it's possible to get this error. Though, it would not yet explain why would vCenter sometimes return a different value for the object.

karniemi commented 1 year ago

I suppose this might explain the problem:

The latest occurrence was when a VM was being absented using vmware_guest, using a task like this:

    - name: delete the VM
      vmware_guest:
        hostname: "{{ vcenter.hostname }}"
        username: "{{ vcenter.username }}"
        password: "{{ vcenter.password }}"
        validate_certs: False
        datacenter: "{{ vcenter_datacenter  }}"
        name: # workaround for ansible/ansible:#32901 
        uuid: "{{ result.instance.hw_product_uuid }}"
        state: absent
        force: yes #ie. poweroff before delete..github:ansible/ansible:#37000

According to the post in that link, the availability of dasVmProtection depends on the power status of the VM. We are using "force" for vmware_guest-module to do the power-off when deleting VMs. At least, it's a good hypothesis that randomly due to powering off, the if-statement might see dasVmProtection, but then inside the if-block ( if dasVmProtection is re-evaluated on each access), the value is None.

karniemi commented 1 year ago

I just tested the hypothesis above by running vmware_guest_facts in a loop like this: while true;do ansible-playbook -i mylab vcenter_vm_facts.yml |egrep "hw_guest_ha_state|hw_power_status";sleep .5;done

...and then powered off the vm via vcenter.

The result:

"hw_guest_ha_state": true, "hw_power_status": "poweredOn",

"hw_guest_ha_state": true, "hw_power_status": "poweredOff",

"hw_guest_ha_state": true, "hw_power_status": "poweredOff",

"hw_guest_ha_state": true, "hw_power_status": "poweredOff",

"hw_guest_ha_state": true, "hw_power_status": "poweredOff",

"hw_guest_ha_state": true, "hw_power_status": "poweredOff",

"hw_guest_ha_state": true, "hw_power_status": "poweredOff",

"hw_guest_ha_state": true, "hw_power_status": "poweredOff",

"hw_guest_ha_state": null, "hw_power_status": "poweredOff",

"hw_guest_ha_state": null, "hw_power_status": "poweredOff",


So: when the hw_power_status goes to poweredOff, the hw_guest_ha_state is still available via the API for a long time. And only after a pretty long time hw_guest_ha_state turns null. I'l like to see this as a prove for the hypothesis: the dasVmProtection turns unavailable at some random time after power-off, and causes the sporadic error for those two lines of code mentioned earlier.

ihumster commented 1 year ago

@karniemi Yes, your right. VirtualMachineRuntimeInfo data object of VirtualMachine managed object contain object dasVmProtection [vim.vm.RuntimeInfo.DasProtectionState] (which is turn has boolean property dasProtected). Powered Off VMs in RuntimeInfo contained nulled (Unset) data object dasVmProtection, and when accessing a property of a non-existent object, the script throws an error.

image

ihumster commented 1 year ago

I could do a check on the object type for this field, but I'm afraid this fix will go to the main branch, and will not be available for version 2 of the collection.

@mariolenz We can add fix for 2.x branch of this collection or not?

karniemi commented 1 year ago

Leaning back a bit. I think this specific issue brings up to greater issues: