ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.11k stars 3.43k forks source link

awx.awx.workflow_node_wait bug #15503

Open jonchen8 opened 2 months ago

jonchen8 commented 2 months ago

Please confirm the following

Bug Summary

using the module awx.awx.workflow_node_wait on awx.awx 24.3.1 and above causes the task to no longer wait for the workflow node and proceeds.

AWX version

24.3.1

Select the relevant components

Installation method

N/A

Modifications

no

Ansible version

2.14

Operating system

RHEL

Web browser

Chrome

Steps to reproduce

Starting with Red Hat's Ansible Automation Platform (AAP) - running 4.5.2 on a RHEL 8 based server, with an execution environment on the project/job template on 2.14 (ansible core).

create a workflow template that contains an approval node like so.

image

the playbook can use the awx.awx collection version 24.3.1 and up - where the issue exists using 24.2.0 - the awx.awx.workflow_node_wait task behaves as expected.

have a playbook thats associated to a job template that has the following tasks to call the approval workflow template and a wait task for the workflow node. please note the

# note: the value of _approval_workflow_name is 'Cruft Cleanup Approval' but the names are not relevant to the issue
- name: 'Launch approval workflow'
  awx.awx.workflow_launch:
    controller_host: "{{ api_hostname }}"
    controller_oauthtoken: "{{ controller_token['token'] }}"
    workflow_template: "{{ _approval_workflow_name }}"
    wait: false
  register: _approval_workflow

- name: 'Wait for approval'
  awx.awx.workflow_node_wait:
    controller_host: "{{ api_hostname }}"
    controller_oauthtoken: "{{ controller_token['token'] }}"
    workflow_job_id: "{{ _approval_workflow['id'] }}"
    name: 'Cruft Cleanup Approval'
    timeout: 600 #10 minutes should be the same as approval node timeout
  register: _workflow_approval_node_details

running this playbook, we can see the Launch approval workflow task would indeed launch the workflow node for approval, but the following task, Wait for approval, does not wait and proceeds with an "ok" status for that task.

we can see here that the task completes without waiting the specified 10 minutes. the meta task i have was to end the playbook since it has additional tasks after but are not relevant to the awx.awx.workflow_node_wait module.

image

a workflow approval job was spawned but was not approved yet and the "wait for approval" task did not wait and proceeded to the rest of the playbook.

Expected results

expected this to be the result while waiting for the workflow approval to be approved (the below screenshot show the task waiting, and had exceeded the set time to wait for). this is using awx.awx 24.2.0 with the same 2 tasks from the steps to reproduce section

image

Actual results

while using awx.awx version 24.3.1 and above

we can see here that the task completes without waiting the specified 10 minutes. the meta task i have was to end the playbook since it has additional tasks after but are not relevant to the awx.awx.workflow_node_wait module.

image

a workflow approval job was spawned but was not approved yet and the "wait for approval" task did not wait and proceeded to the rest of the playbook.

here are additional details to the task itself

image

Additional information

While trying to update our version of the awx.awx collection to version 24.6.1, we found that the workflow_node_wait module was no longer waiting for our approval node to be approved before continuing on with the rest of our code

After some investigation, we believe it might be due to this chunk of code https://github.com/ansible/awx/blob/94e5795dfc37b95c576d61f3e3b4e936c021548c/awx_collection/plugins/module_utils/controller_api.py#L1050-L1053 introduced in version 24.3.1.

When the approval node is launched, the status of the event_processing_finished field immediately changes to true. Since the while condition for the timeout code is checking for while that field is false, we suspect that code doesn't run and it skips to processing the result output, therefore never checking on or waiting for the timeout.

Here is what our approval node's json output looked like mere seconds after launching:

image

jonchen8 commented 2 months ago

Hi - All, after doing some digging around, i might have a potential solution for this and would like to get some thoughts and opinions:

here is the original code in question:

https://github.com/ansible/awx/blob/94e5795dfc37b95c576d61f3e3b4e936c021548c/awx_collection/plugins/module_utils/controller_api.py#L1050-L1053

I'm curious to whether this would be the ideal solution since this takes into account whether a node is considered in progress, hence the pending status check

wait_on_field = 'event_processing_finished'
if wait_on_field not in result['json']:
 wait_on_field = 'finished'
while not result['json'][wait_on_field] or result['json']['status'] == 'pending':
Daniel-dev22 commented 2 months ago

I noticed the same behavior. This can cause a serious issue for those relying on this module to do what its name is and "wait"