ansible-collections / ansible.windows

Windows core collection for Ansible
https://galaxy.ansible.com/ansible/windows
GNU General Public License v3.0
233 stars 157 forks source link

intermittent "unable to delete temporary file" errors #598

Closed Yannik closed 1 month ago

Yannik commented 3 months ago
SUMMARY

In addition to the winrm connection issues discussed in #597, there is another intermittent issue that is starting to show up more often with growing host count. I would say this happens once in every 5000 task executions. It happens to all types of ansible.windows modules, so it seems to be some type of general problem with how ansible executes powershell on windows.

Here are some sample errors:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: at <ScriptBlock>, <No file>: line 30
fatal: [degoe01sc021]: FAILED! => changed=false 
  msg: |-
    Unhandled exception while executing module: Failed to compile C# code:
    error CS1610: Warning as Error: Unable to delete temporary file 'c:\Users\svc_ansible_admin\AppData\Local\Temp\CSC946AF5F04C1D4088ADE659D4FDA4A174.TMP' used for default Win32 resource -- Access is denied.
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: at <ScriptBlock>, <No file>: line 11
fatal: [degoe10fs001]: FAILED! => changed=false 
  msg: |-
    internal error: failed to run exec_wrapper action module_powershell_wrapper: Failed to compile C# code:
    error CS1610: Warning as Error: Unable to delete temporary file 'c:\Users\svc_ansible_admin\AppData\Local\Temp\CSC3268DB069BBF42C09F1BAD4657BBA33F.TMP' used for default Win32 resource -- Access is denied.
ISSUE TYPE
COMPONENT NAME

winrm

ANSIBLE VERSION
ansible [core 2.16.3]
  config file = /builds/ansible/deployments/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /app/lib/python3.12/site-packages/ansible
  ansible collection location = /builds/ansible/deployments/vendor_collections
  executable location = /app/bin//ansible
  python version = 3.12.2 (main, Feb  7 2024, 22:13:24) [GCC 13.2.1 20231014] (/usr/local/bin/python)
  jinja version = 3.1.3
  libyaml = True
COLLECTION VERSION

Collection      Version
--------------- -------
ansible.windows 1.14.0 
CONFIGURATION
CACHE_PLUGIN_CONNECTION(/home/yannik/projects/xxx/ansible/ansible.cfg) = .ansible_facts
CACHE_PLUGIN_TIMEOUT(/home/yannik/projects/xxx/ansible/ansible.cfg) = 60
CALLBACKS_ENABLED(/home/yannik/projects/xxx/ansible/ansible.cfg) = ['ansible.posix.profile_tasks']
COLLECTIONS_PATHS(/home/yannik/projects/xxx/ansible/ansible.cfg) = ['/home/yannik/projects/xxx/ansible/vendor_collections']
CONFIG_FILE() = /home/yannik/projects/xxx/ansible/ansible.cfg
DEFAULT_FORKS(/home/yannik/projects/xxx/ansible/ansible.cfg) = 25
DEFAULT_HOST_LIST(/home/yannik/projects/xxx/ansible/ansible.cfg) = ['/home/yannik/projects/xxx/ansible/inventory']
DEFAULT_MANAGED_STR(/home/yannik/projects/xxx/ansible/ansible.cfg) = This file is managed by ansible and will be overwritten! Do not change it manually!
DEFAULT_ROLES_PATH(/home/yannik/projects/xxx/ansible/ansible.cfg) = ['/home/yannik/projects/xxx/ansible/vendor_roles']
DEFAULT_STDOUT_CALLBACK(/home/yannik/projects/xxx/ansible/ansible.cfg) = yaml
DEFAULT_TIMEOUT(/home/yannik/projects/xxx/ansible/ansible.cfg) = 120
DIFF_ALWAYS(/home/yannik/projects/xxx/ansible/ansible.cfg) = True
EDITOR(env: EDITOR) = vim
HOST_KEY_CHECKING(/home/yannik/projects/xxx/ansible/ansible.cfg) = False
INTERPRETER_PYTHON(/home/yannik/projects/xxx/ansible/ansible.cfg) = auto_silent
PAGER(env: PAGER) = less
RETRY_FILES_ENABLED(/home/yannik/projects/xxx/ansible/ansible.cfg) = False
OS / ENVIRONMENT

Target OS: windows server 2022

pywinrm-0.4.3 pykerberos-1.2.4

jborean93 commented 2 months ago

Part of executing modules is to compile a helper module util that is written in C#. This C# util requires compilation and internally uses csc.exe to produce a temporary dll. The error indicated here CS1610 would indicate it failed to remove the temporary files from this operation but the cause behind it is not really seen. While I can investigate potentially turning this error into a warning there is not much we can do to solve the underlying cause as the delete part happens under a process not under our control.

Yannik commented 2 months ago

@jborean93 It would be greatly appreciated if this could be turned into a warning instead of an error.
This is really causing a headache since it results in regular (but unpredictable) job failures.

Yannik commented 2 months ago

Just out of curiosity: it seems like this helper module compilation/deletion is done on every task. Is that really efficient?

jborean93 commented 2 months ago

Just out of curiosity: it seems like this helper module compilation/deletion is done on every task. Is that really efficient?

There are a few things that come into play with file deletion and Ansible

Controller Deletion

This happens when the controller needs to create a temporary directory for an action invocation. For example using win_copy/win_template to copy a file it will

This also can happen if pipelining is disabled but for Windows and the winrm/psrp/ssh connection plugin you really have to go out of your way to disable pipelining so this won't apply.

Module Deletion

This happens when the module using the Ansible.Basic module wrapper asks for a temporary directory and there is no controller managed tempdir. When the module is finished it will delete the tempdir and any files that were created. This is all done in the same process so should be quite efficient.

This Issue

This particular problem is somewhat separate from the other two types of tempdirs. When modules run a module_util that is written in C# the code needs to be compiled. Internally we essentially call something similar to the Add-Type cmdlet so the actual creation of this temporary file and trying to delete it is really outside of our control. I'll have to play around with this a bit more to see what knobs I have to control this particular failure.

Just out of curiosity: it seems like this helper module compilation/deletion is done on every task. Is that really efficient?

We do this on every task because each task should be self contained and not leave any leftover artifacts. Outside of the controller side tempdir (which should be somewhat uncommon on Windows) managing the module side tempdir is a fairly inexpensive (and also uncommon task).

jborean93 commented 2 months ago

I need to spend some more time in figuring out a good way to test it out but I hope https://github.com/ansible/ansible/pull/83080 should improve the situation for you. Unfortunately the error comes from csc.exe which the .NET API is calling which I have no control off. Ignoring the error code and manually trying to delete the directory should help with this situation.

Yannik commented 2 months ago

Thanks, that looks great!

Yannik commented 1 month ago

@jborean93 Any chance to get this merged soon? This keeps breaking large playbook runs for me. :(

jborean93 commented 1 month ago

Sorry for the delay, it has been merged and backports have been created.

Yannik commented 1 month ago

@jborean93 Thank you so much!