Open aheath1992 opened 2 years ago
@aheath1992 did you change any settings? Can you give us a bit more detail concerning the naming of the directories? Is there a common prefix?
Can you also show us the output of this command?
$ awx-manage shell_plus
settings.AWX_CLEANUP_PATHS
settings.AWX_CLEANUP_PATHS True
@aheath1992 can you please share the directory naming info? Is there a common prefix?
/tmp/
Same here with AWX 21.6.0 deployed with the operator. Crashed our instance due to filling inode to 100% on the EE pod.
Hi,
We are on Ansible Automation Platform 2.3 (VM deployment) and we are still seeing this issue, for whatever reason on just a given execution node.
// /tmp is AWX_ISOLATION_BASE_PATH for us
settings.AWX_CLEANUP_PATHS is set to True and it kinda works. I guess this issue is probably on ansible-runner not cleaning the private_data_dir, since I can see --delete
option is set:
awx 961792 961625 0 11:02 ? 00:00:03 /usr/bin/python3.9 /usr/bin/ansible-runner worker --private-data-dir=/tmp/awx_1100353_agkzt73z --delete
Code to cleanup private directory on ansible-runner looks straightforward (at a first glance) https://github.com/ansible/ansible-runner/blob/devel/ansible_runner/__main__.py#L773
Either this code is breaking on ansible-runner, or some exception is thrown before it arrives to the line 773. In either case, we need to add some logging statement to runner so we catch what is going on and why sometimes the directories do not get cleaned up, or increase loglevel on runner to check what may be going on (if that is possible).
Please confirm the following
Bug Summary
Jobs are leaving artifacts in the /tmp directory when some jobs finish. This is causing the root disk for systems to fill up and crash. Would like a job or automation that could check and remove stale artifacts that won't crash the service.
AWX version
Ansible Automation Platform Controller 4.2.0
Select the relevant components
Installation method
N/A
Modifications
no
Ansible version
2.13
Operating system
RHEL 8
Web browser
Firefox
Steps to reproduce
When certain jobs run, they leave folders in the /tmp directory. AWX sometimes does not clean up these files and they fill up the root disk, causing the system to crash.
Expected results
Jobs to be cleaned out or some background automation that will search the /tmp directory can clean up files older than a certain time frame that match the AWX folder naming.
Actual results
Folder artifacts accumulate in the /tmp directory and the /tmp cleanup daemon does not acknowledge the folders to be cleaned up, so folders build up and fill up the root disk
Additional information
I previously tried adding the AWX naming for folders to the /tmp clean up daemon, but when I did that, it crashed the platform. So unsure what I messed up or if there is a better process hidden in documentation somewhere, I missed