ansible-collections / ansible.windows

Windows core collection for Ansible
https://galaxy.ansible.com/ansible/windows
GNU General Public License v3.0
244 stars 164 forks source link

win_powershell: Executing Register-ScheduledJob results in memory leak & high cpu usage when job already exists #375

Closed Yannik closed 2 years ago

Yannik commented 2 years ago
SUMMARY

Running Register-ScheduledJob with an already existing -Name in a win_powershell task results in memory leak and high cpu usage on the target hosts. The task never ends, and even if cancelled on the controller node, the powershell process runs forever on the target node, resulting in total memory exhaustion.

ISSUE TYPE
COMPONENT NAME

win_powershell

ANSIBLE VERSION
ansible [core 2.11.6] 
  config file = /home/yannik/projects/luetjenburg/ansible/ansible.cfg
  configured module search path = ['/home/yannik/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/yannik/.local/lib/python3.9/site-packages/ansible
  ansible collection location = /home/yannik/projects/luet/ansible/vendor_collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.12 (main, Mar 25 2022, 00:00:00) [GCC 11.2.1 20220127 (Red Hat 11.2.1-9)]
  jinja version = 3.0.3
  libyaml = True
COLLECTION VERSION
# /home/yannik/projects/luetjenburg/ansible/vendor_collections/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 2.3.0  
ansible.utils     2.3.1  
community.network 3.0.0  
juniper.device    1.0.1  

# /usr/local/lib/python3.9/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    2.1.0  
ansible.netcommon             2.5.0  
ansible.posix                 1.3.0  
ansible.utils                 2.4.3  
ansible.windows               1.9.0  
arista.eos                    3.1.0  
awx.awx                       19.4.0 
azure.azcollection            1.11.0 
check_point.mgmt              2.2.2  
chocolatey.chocolatey         1.1.0  
cisco.aci                     2.1.0  
cisco.asa                     2.1.0  
cisco.intersight              1.0.18 
cisco.ios                     2.6.0  
cisco.iosxr                   2.6.0  
cisco.ise                     1.2.1  
cisco.meraki                  2.6.0  
cisco.mso                     1.3.0  
cisco.nso                     1.0.3  
cisco.nxos                    2.8.2  
cisco.ucs                     1.6.0  
cloud.common                  2.1.0  
cloudscale_ch.cloud           2.2.0  
community.aws                 2.2.0  
community.azure               1.1.0  
community.ciscosmb            1.0.4  
community.crypto              2.2.0  
community.digitalocean        1.15.0 
community.dns                 2.0.6  
community.docker              2.1.1  
community.fortios             1.0.0  
community.general             4.4.0  
community.google              1.0.0  
community.grafana             1.3.0  
community.hashi_vault         2.2.0  
community.hrobot              1.2.2  
community.kubernetes          2.0.1  
community.kubevirt            1.0.0  
community.libvirt             1.0.2  
community.mongodb             1.3.2  
community.mysql               2.3.3  
community.network             3.0.0  
community.okd                 2.1.0  
community.postgresql          1.6.1  
community.proxysql            1.3.1  
community.rabbitmq            1.1.0  
community.routeros            2.0.0  
community.skydive             1.0.0  
community.sops                1.2.0  
community.vmware              1.17.1 
community.windows             1.9.0  
community.zabbix              1.5.1  
containers.podman             1.9.1  
cyberark.conjur               1.1.0  
cyberark.pas                  1.0.13 
dellemc.enterprise_sonic      1.1.0  
dellemc.openmanage            4.4.0  
dellemc.os10                  1.1.1  
dellemc.os6                   1.0.7  
dellemc.os9                   1.0.4  
f5networks.f5_modules         1.14.0 
fortinet.fortimanager         2.1.4  
fortinet.fortios              2.1.3  
frr.frr                       1.0.3  
gluster.gluster               1.0.2  
google.cloud                  1.0.2  
hetzner.hcloud                1.6.0  
hpe.nimble                    1.1.4  
ibm.qradar                    1.0.3  
infinidat.infinibox           1.3.3  
infoblox.nios_modules         1.2.1  
inspur.sm                     1.3.0  
junipernetworks.junos         2.8.0  
kubernetes.core               2.2.3  
mellanox.onyx                 1.0.0  
netapp.aws                    21.7.0 
netapp.azure                  21.10.0
netapp.cloudmanager           21.13.0
netapp.elementsw              21.7.0 
netapp.ontap                  21.15.1
netapp.storagegrid            21.9.0 
netapp.um_info                21.8.0 
netapp_eseries.santricity     1.2.13 
netbox.netbox                 3.5.1  
ngine_io.cloudstack           2.2.2  
ngine_io.exoscale             1.0.0  
ngine_io.vultr                1.1.0  
openstack.cloud               1.6.0  
openvswitch.openvswitch       2.1.0  
ovirt.ovirt                   1.6.6  
purestorage.flasharray        1.12.1 
purestorage.flashblade        1.9.0  
sensu.sensu_go                1.13.0 
servicenow.servicenow         1.0.6  
splunk.es                     1.0.2  
t_systems_mms.icinga_director 1.27.0 
theforeman.foreman            2.2.0  
vyos.vyos                     2.6.0  
wti.remote                    1.0.3  
CONFIGURATION
ANSIBLE_NOCOWS(/home/yannik/projects/luet/ansible/ansible.cfg) = True
COLLECTIONS_PATHS(/home/yannik/projects/luet/ansible/ansible.cfg) = ['/home/yannik/projects/luet/ansible/vendor_collections']
DEFAULT_HOST_LIST(/home/yannik/projects/luet/ansible/ansible.cfg) = ['/home/yannik/projects/luet/ansible/hosts']
DEFAULT_LOAD_CALLBACK_PLUGINS(/home/yannik/projects/luet/ansible/ansible.cfg) = True
DEFAULT_ROLES_PATH(/home/yannik/projects/luet/ansible/ansible.cfg) = ['/home/yannik/projects/luet/ansible/vendor_roles']
DEFAULT_STDOUT_CALLBACK(/home/yannik/projects/luet/ansible/ansible.cfg) = yaml
INTERPRETER_PYTHON(/home/yannik/projects/luet/ansible/ansible.cfg) = auto_legacy_silent
OS / ENVIRONMENT

Host os: fedora 35 Target os: Windows Server 2022

STEPS TO REPRODUCE
- ansible.windows.win_powershell:
    script: |
      Register-ScheduledJob -Name TestTask -ScriptBlock { $True }
    error_action: stop

Run this playbook twice. On the second run, the memory leak & cpu usage occurs.

Using -FilePath instead of -ScriptBlock has the same result.

EXPECTED RESULTS

Task should terminate with an error (as it does when executing the command without ansible), no memory leak should occur, and the task should be registered.

ACTUAL RESULTS
TASK [mgmt-vm : ansible.windows.win_powershell] *****************************************************************************************************
task path: /home/yannik/projects/luetjenburg/ansible/roles/mgmt-vm/tasks/mgmt-vm.yml:176
Using module file /usr/local/lib/python3.9/site-packages/ansible_collections/ansible/windows/plugins/modules/win_powershell.ps1
Pipelining is enabled.
<mgmt-vm> ESTABLISH WINRM CONNECTION FOR USER: administrator on PORT 5985 TO mgmt-vm
EXEC (via pipeline wrapper)

The task never terminates, and memory usage of the powershell.exe process on the target node is rising very quickly. Terminating the ansible process on the controller node does not result in the process being ended on the target node. Memory usage will rise until exhaustion: Screenshot from 2022-06-14 17-53-56

(This is after about one minute)

Yannik commented 2 years ago

I'm a bit confused. Sometimes the leak occurs on first execution (no task with the same name exists), sometimes it only occurs on the second execution (task with the same name exists)

jborean93 commented 2 years ago

This is most likely related to https://github.com/ansible-collections/ansible.windows/issues/360. The output of Register-ScheduledJob most likely contains a type that is either highly nested or contains a circular reference. This causes problems when trying to serialize the output to json. I'll have to have a look at it to see what property is causing that problem and find a way to avoid that.

jborean93 commented 2 years ago

Looks like the problem was something else. When a scheduled job already existed an error record is generated and the job definition is part of the error records TargetObject property. The win_powershell module serializes this property but doesn't try to sanitise the value like we do with other result values. The fix is to ensure the target_object field also gets sanitised so it doesn't get stuck trying to serialize a highly nested object or an object with circular references. The fix for this is https://github.com/ansible-collections/ansible.windows/pull/386