ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.02k stars 3.42k forks source link

Jobs leaving artifacts in /tmp directory #12661

Open aheath1992 opened 2 years ago

aheath1992 commented 2 years ago

Please confirm the following

Bug Summary

Jobs are leaving artifacts in the /tmp directory when some jobs finish. This is causing the root disk for systems to fill up and crash. Would like a job or automation that could check and remove stale artifacts that won't crash the service.

AWX version

Ansible Automation Platform Controller 4.2.0

Select the relevant components

Installation method

N/A

Modifications

no

Ansible version

2.13

Operating system

RHEL 8

Web browser

Firefox

Steps to reproduce

When certain jobs run, they leave folders in the /tmp directory. AWX sometimes does not clean up these files and they fill up the root disk, causing the system to crash.

Expected results

Jobs to be cleaned out or some background automation that will search the /tmp directory can clean up files older than a certain time frame that match the AWX folder naming.

Actual results

Folder artifacts accumulate in the /tmp directory and the /tmp cleanup daemon does not acknowledge the folders to be cleaned up, so folders build up and fill up the root disk

Additional information

I previously tried adding the AWX naming for folders to the /tmp clean up daemon, but when I did that, it crashed the platform. So unsure what I messed up or if there is a better process hidden in documentation somewhere, I missed

akus062381 commented 2 years ago

@aheath1992 did you change any settings? Can you give us a bit more detail concerning the naming of the directories? Is there a common prefix?

Can you also show us the output of this command?

$ awx-manage shell_plus
settings.AWX_CLEANUP_PATHS
aheath1992 commented 2 years ago

settings.AWX_CLEANUP_PATHS True

akus062381 commented 2 years ago

@aheath1992 can you please share the directory naming info? Is there a common prefix?

aheath1992 commented 2 years ago

/tmp/ /tmp/awx_100542_t6ixwfln

grimlokason commented 1 year ago

Same here with AWX 21.6.0 deployed with the operator. Crashed our instance due to filling inode to 100% on the EE pod.

jangel97 commented 1 year ago

Hi,

We are on Ansible Automation Platform 2.3 (VM deployment) and we are still seeing this issue, for whatever reason on just a given execution node.

image image

// /tmp is AWX_ISOLATION_BASE_PATH for us

settings.AWX_CLEANUP_PATHS is set to True and it kinda works. I guess this issue is probably on ansible-runner not cleaning the private_data_dir, since I can see --delete option is set:

awx       961792  961625  0 11:02 ?        00:00:03 /usr/bin/python3.9 /usr/bin/ansible-runner worker --private-data-dir=/tmp/awx_1100353_agkzt73z --delete

Code to cleanup private directory on ansible-runner looks straightforward (at a first glance) https://github.com/ansible/ansible-runner/blob/devel/ansible_runner/__main__.py#L773

Either this code is breaking on ansible-runner, or some exception is thrown before it arrives to the line 773. In either case, we need to add some logging statement to runner so we catch what is going on and why sometimes the directories do not get cleaned up, or increase loglevel on runner to check what may be going on (if that is possible).