ansible / ansible

Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.
https://www.ansible.com/
GNU General Public License v3.0
62.93k stars 23.9k forks source link

async_status with mode cleanup not waiting for job to complete before cleaning up #81664

Closed mrmcmuffinz closed 1 year ago

mrmcmuffinz commented 1 year ago

Summary

When running an async task with async_status and mode cleanup, cleanup happens but before the async task is complete. This causes the below error

fatal: [127.0.0.1]: FAILED! => {"ansible_job_id": "j456852675014.47763", "attempts": 2, "changed": false, "erased": "~/.ansible_async/j456852675014.47763", "finished": 1, "msg": "could not find job", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Issue Type

Bug Report

Component Name

ansible

Ansible Version

$ ansible --version
ansible-playbook [core 2.15.3]
  config file = None
  configured module search path = ['/Users/abrahamcabrera/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/abrahamcabrera/.pyenv/versions/test23333333/lib/python3.11/site-packages/ansible
  ansible collection location = /Users/abrahamcabrera/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/abrahamcabrera/.pyenv/versions/test23333333/bin/ansible-playbook
  python version = 3.11.4 (main, Jul 11 2023, 16:33:20) [Clang 14.0.3 (clang-1403.0.22.14.1)] (/Users/abrahamcabrera/.pyenv/versions/test23333333/bin/python)
  jinja version = 3.1.2
  libyaml = True

Configuration

not applicable as I don't have one to specify for my tiny example.

OS / Environment

MAC OS Ventura 13.5.1

Steps to Reproduce

test.yml:

---
- name: "Test"
  hosts: 127.0.0.1
  tasks:
    - name: Run script async
      shell: bash script.sh
      async: 1000
      poll: 0
      register: script

    - name: "Check on an async task every 10 seconds"
      async_status:
        jid: "{{ script.ansible_job_id }}"
        mode: cleanup
      register: job_result
      until: job_result.finished
      retries: 20
      delay: 10

    - debug:
        msg: "Done"

script.sh:

#!/bin/bash

echo "hello 1"
sleep 10

echo "hello 2"
sleep 10

echo "hello 3"
sleep 10

echo "hello 4"
sleep 10

echo "hello 5"
sleep 10

Expected Results

I expect that the cleanup of the async file to happen after the job has completed not before. Example of output that does not have mode set:

$ ansible-playbook test.yml
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [Test] ****************************************************************************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Run script async] ****************************************************************************************************************************************************************************
changed: [127.0.0.1]

TASK [Check on an async task every 10 seconds] *****************************************************************************************************************************************************
FAILED - RETRYING: [127.0.0.1]: Check on an async task every 10 seconds (20 retries left).
FAILED - RETRYING: [127.0.0.1]: Check on an async task every 10 seconds (19 retries left).
FAILED - RETRYING: [127.0.0.1]: Check on an async task every 10 seconds (18 retries left).
FAILED - RETRYING: [127.0.0.1]: Check on an async task every 10 seconds (17 retries left).
FAILED - RETRYING: [127.0.0.1]: Check on an async task every 10 seconds (16 retries left).
changed: [127.0.0.1]

TASK [debug] ***************************************************************************************************************************************************************************************
ok: [127.0.0.1] => {
    "msg": "Done"
}

PLAY RECAP *****************************************************************************************************************************************************************************************
127.0.0.1                  : ok=4    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Actual Results

ansible-playbook test.yml
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [Test] ****************************************************************************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Run script async] ****************************************************************************************************************************************************************************
changed: [127.0.0.1]

TASK [Check on an async task every 10 seconds] *****************************************************************************************************************************************************
FAILED - RETRYING: [127.0.0.1]: Check on an async task every 10 seconds (20 retries left).
fatal: [127.0.0.1]: FAILED! => {"ansible_job_id": "j456852675014.47763", "attempts": 2, "changed": false, "erased": "~/.ansible_async/j456852675014.47763", "finished": 1, "msg": "could not find job", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

PLAY RECAP *****************************************************************************************************************************************************************************************
127.0.0.1                  : ok=2    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0


### Code of Conduct

- [X] I agree to follow the Ansible Code of Conduct
ansibot commented 1 year ago

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the component bot command.

s-hertel commented 1 year ago

Hi!

Thanks very much for your submission to Ansible. It means a lot to us that you've taken time to contribute. This is working as intended, but we'd be open to someone adding a feature to improve the interface. You could make the mode parameter conditional to only pass cleanup when the job is finished.

However, we're absolutely always up for discussion. Because this project is very active, we're unlikely to see comments made on closed tickets and we lock them after some time. If you or anyone else has any further questions, please let us know by using any of the communication methods listed in the page below:

In the future, sometimes starting a discussion on the development list prior to implementing a feature can make getting things included a little easier, but it's not always necessary.

Thank you once again for this and your interest in Ansible!

bcoca commented 1 year ago

You could make it work by making the mode conditional on the results being finished and/or retries max number met

mrmcmuffinz commented 1 year ago

I would say that it is very poorly written then, just because it is expected doesn't make it right. If I'm going to use async in general I should not have to worry about cleanup of any internal files from ansible.

This is working as intended, but we'd be open to someone adding a feature to improve the interface.

Even if this were the case, do you have example code? Or even given my example above a modification/patch demonstrating this behavior? Seriously I'm asking because this isn't really documented well in the docs.

You could make the mode parameter conditional to only pass cleanup when the job is finished.

A bit of feedback from my side would be to provide examples of how to achieve said proposed solution. As you clearly stated I spent time providing a simple use case and example code to illustrate my observed bug. Please show the same courtesy or point me to docs illustrating how to achieve the same. As for using the communications, I went on matrix to ask and didn't really get a response. I have also tried to send emails to the distribution and they get rejected.

s-hertel commented 1 year ago

If I'm going to use async in general I should not have to worry about cleanup of any internal files from ansible.

That is a requirement for poll: 0. From https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_async.html:

When running with poll: 0, Ansible will not automatically cleanup the async job cache file. You will need to manually clean this up with the async_status module with mode: cleanup.

The async_status documentation itself is a little vague about how mode: cleanup works. That could be improved, and the examples could show how to cleanup, something like https://github.com/ansible/ansible/pull/81697.

Even if this were the case, do you have example code? Or even given my example above a modification/patch demonstrating this behavior? Seriously I'm asking because this isn't really documented well in the docs.

For example, a wait option could be added here https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/action/async_status.py#L24-L29.

s-hertel commented 1 year ago

A bit of feedback from my side would be to provide examples of how to achieve said proposed solution.

Here you go.

  - name: Wait for async job to end and then cleanup
    ansible.builtin.async_status:
      jid: '{{ sleeper.ansible_job_id }}'
      mode: '{{ item }}'
    register: job_result
    until: job_result.finished or (job_result.erased is defined)
    retries: 5
    delay: 10
    loop:
      - 'status'
      - 'cleanup'

retries/until doesn't allow access to the register variable, but it's easy enough with a loop.