ansible / awx-ee

An Ansible execution environment for AWX project
https://quay.io/ansible/awx-ee
Other
133 stars 156 forks source link

AWX EE crashes without an error message #75

Open kazigk opened 3 years ago

kazigk commented 3 years ago

I have one specific playbook using strategy: free and I execute it on over 100 hosts. It produces about ~1700 lines of logs and in most cases it does have an "error" status on AWX web interface.

There's no error message in job output on AWX web, so I checked logs from the pod itself using kubectl logs -n awx -f automation-job-81-5nmhw and this is the last line of said logs (formatted using Beautifier):

{
    "uuid": "4b8c7664-a8a6-4295-836b-9480b8766a4b",
    "counter": 2886,
    "stdout": "",
    "start_line": 1674,
    "end_line": 1674,
    "runner_ident": "81",
    "event": "runner_on_start",
    "job_id": 81,
    "pid": 20,
    "created": "2021-06-16T08:16:04.477365",
    "parent_uuid": "c676cd20-5b22-898c-8cc6-00000000007f",
    "event_data": {
        "playbook": "restart_apps.yml",
        "playbook_uuid": "19727f38-00ac-41f2-a46d-44d0ed81c721",
        "play": "apps",
        "play_uuid": "c676cd20-5b22-898c-8cc6-000000000078",
        "play_pattern": "apps",
        "task": "Start apps",
        "task_uuid": "c676cd20-5b22-898c-8cc6-00000000007f",
        "task_action": "shell",
        "task_args": "",
        "task_path": "/runner/project/restart_apps.yml:37",
        "host": "REDACTED",
        "uuid": "4b8c7664-a8a6-4295-836b-9480b8766a4b"
    }
}

Seems like it just crashes without any error message? The interesting part is that the playbook does all the changes to the servers, just the output is incomplete.

After executing the same playbook on a small part of my inventory, it finishes successfully with play recap and those lines:

{"status": "successful", "runner_ident": "82"}
{"zipfile": 2998}
<BASE64 encoded zip file>

I tested it on the following versions of AWX EE: 0.2.0, 0.3.0 and 0.4.0 with the same result. I also checked logs of awx-web, awx-task and awx-ee, but I didn't find anything useful.

AWX Version: 19.2.0 AWX Operator version: 0.10.0

Is there anything else I can check?

shanemcd commented 3 years ago

This smells a lot like https://github.com/ansible/awx/issues/9961