ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.08k stars 3.42k forks source link

Exception during Project update triggered from Job launch #9027

Open hjkatz opened 3 years ago

hjkatz commented 3 years ago
ISSUE TYPE
SUMMARY

When launching a Job that triggers a Project update the Project update (and the original Job) both fail.

There is an exception in the awx-task logs:

2021-01-06 18:51:19,514 ERROR    awx.main.tasks job 353 (running) Exception occurred while running task
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 1383, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 1942, in pre_run_hook
    sync_task.run(local_project_sync.id)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 814, in _wrapped
    return f(self, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 1555, in run
    raise AwxTaskError.TaskError(self.instance, rc)
Exception: project_update 354 (failed) encountered an error (rc=2), please see task stdout for details.

The full related output of awx-task:

2021-01-06 18:50:56,099 DEBUG    awx.main.scheduler Running Tower task manager.
2021-01-06 18:50:56,116 DEBUG    awx.main.scheduler Starting Scheduler
2021-01-06 18:50:56,384 DEBUG    awx.main.scheduler job 353 (waiting) consumed 6 capacity units from Shared ansible repo Docker image with prior total of 0
2021-01-06 18:50:56,385 DEBUG    awx.main.scheduler Finishing Scheduler
2021-01-06 18:50:56,780 DEBUG    awx.main.dispatch task 0dac4555-15ef-400e-a2b7-26717da758c3 starting awx.main.tasks.RunJob(*[353])
2021-01-06 18:50:56,814 DEBUG    awx.main.tasks Launching pod for job 353 (waiting).
2021-01-06 18:50:56,919 DEBUG    awx.main.scheduler Checking for pod awx-job-353. Attempt 1 of 100
2021-01-06 18:50:56,941 DEBUG    awx.main.scheduler Pod awx-job-353 is Pending.

2021-01-06 18:51:01,946 DEBUG    awx.main.scheduler Checking for pod awx-job-353. Attempt 2 of 100
RESULT 2
OKREADY
2021-01-06 18:51:01,967 DEBUG    awx.main.scheduler Pod awx-job-353 is online.
2021-01-06 18:51:02,667 DEBUG    awx.main.tasks Project not available locally, job 353 (running) will sync with remote
2021-01-06 18:51:02,981 WARNING  awx.main.tasks Galaxy role/collection syncing is enabled, but no credentials are configured for Squarespace-1.
2021-01-06 18:51:05,209 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:05,209 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:06,624 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:08,222 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:08,222 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:09,130 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:09,130 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:09,130 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:09,130 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(2)
2021-01-06 18:51:09,236 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:09,236 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(3)
2021-01-06 18:51:10,834 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:10,834 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:10,837 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:10,837 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:10,837 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:10,837 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:10,937 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:10,937 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:15,048 DEBUG    awx.main.dispatch task 2febc4df-8a19-4494-8336-3a3411ba4e11 starting awx.main.tasks.cluster_node_heartbeat(*[])
2021-01-06 18:51:15,049 DEBUG    awx.main.tasks Cluster node heartbeat task.
2021-01-06 18:51:15,100 DEBUG    awx.main.dispatch task 4852e087-1b92-40c7-8845-2a0088f8b18e starting awx.main.tasks.awx_k8s_reaper(*[])
2021-01-06 18:51:15,106 DEBUG    awx.main.dispatch task 4fee5162-862a-489c-8052-71d293e1ce87 starting awx.main.tasks.awx_periodic_scheduler(*[])
2021-01-06 18:51:15,139 DEBUG    awx.main.tasks Starting periodic scheduler
2021-01-06 18:51:15,142 DEBUG    awx.main.tasks Last scheduler run was: 2021-01-06 18:50:45.031597+00:00
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
Instance Group already registered tower
2021-01-06 18:51:15,161 DEBUG    awx.main.tasks Checking for orphaned k8s pods for Shared ansible repo Docker image-2.
2021-01-06 18:51:15,226 WARNING  awx.main.dispatch scaling up worker pid:12160
2021-01-06 18:51:15,428 DEBUG    awx.main.dispatch task 088ebc38-f24a-4020-9661-cb2d8227cc83 starting awx.main.scheduler.tasks.run_task_manager(*[])
2021-01-06 18:51:15,430 DEBUG    awx.main.scheduler Running Tower task manager.
2021-01-06 18:51:15,452 DEBUG    awx.main.scheduler Starting Scheduler
2021-01-06 18:51:15,628 DEBUG    awx.main.scheduler Finishing Scheduler
2021-01-06 18:51:17,485 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:17,485 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:17,686 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:17,686 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:17,686 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:17,686 DEBUG    awx.main.commands.run_callback_receiver ProjectUpdateEvent.objects.bulk_create(1)
2021-01-06 18:51:18,076 DEBUG    awx.main.tasks project_update 354 (running) finished running, producing 28 events.
2021-01-06 18:51:18,077 INFO     awx.main.commands.run_callback_receiver Event processing is finished for Job 354, sending notifications
2021-01-06 18:51:18,077 INFO     awx.main.commands.run_callback_receiver Event processing is finished for Job 354, sending notifications
2021-01-06 18:51:18,147 DEBUG    awx.main.dispatch task 1df3998b-3fce-47bf-9510-e59ba9c7c664 starting awx.main.tasks.handle_success_and_failure_notifications(*[354])
2021-01-06 18:51:18,146 WARNING  awx.main.dispatch scaling down worker pid:11514
2021-01-06 18:51:18,177 WARNING  awx.main.dispatch worker exiting gracefully pid:11514
Instance Group already registered tower
Instance Group already registered tower
2021-01-06 18:51:19,514 ERROR    awx.main.tasks job 353 (running) Exception occurred while running task
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 1383, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 1942, in pre_run_hook
    sync_task.run(local_project_sync.id)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 814, in _wrapped
    return f(self, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py", line 1555, in run
    raise AwxTaskError.TaskError(self.instance, rc)
Exception: project_update 354 (failed) encountered an error (rc=2), please see task stdout for details.
2021-01-06 18:51:19,516 DEBUG    awx.main.tasks job 353 (running) finished running, producing 0 events.
2021-01-06 18:51:19,629 DEBUG    awx.main.tasks Deleting pod awx-job-353
2021-01-06 18:51:19,690 DEBUG    awx.main.dispatch task c6b30c9a-740e-40ae-9ff3-9b934007f9e5 starting awx.main.tasks.update_inventory_computed_fields(*[26])
2021-01-06 18:51:19,694 WARNING  awx.main.dispatch job 353 (error) encountered an error (rc=None), please see task stdout for details.
2021-01-06 18:51:19,696 DEBUG    awx.main.dispatch task 0dac4555-15ef-400e-a2b7-26717da758c3 starting awx.main.tasks.handle_work_error(*['0dac4555-15ef-400e-a2b7-26717da758c3'])
2021-01-06 18:51:19,697 DEBUG    awx.main.tasks Executing error task id 0dac4555-15ef-400e-a2b7-26717da758c3, subtasks: [{'type': 'job', 'id': 353}]

In this case the original Job is 353 and the Project update Job is 354, however, Job 354 does not exist in the API or in the UI, See: image

Notes for this case:

ENVIRONMENT
ryanpetrello commented 3 years ago

@hjkatz can you show us the output of /api/v2/project_updates/354/? Thanks.

hjkatz commented 3 years ago
{
    "id": 354,
    "type": "project_update",
    "url": "/api/v2/project_updates/354/",
    "related": { ...  },
    "summary_fields": {
        "organization": { ...  },
        "project": {
            "id": 6,
            "name": "Shared Ansible Repo",
            "description": "",
            "status": "successful",
            "scm_type": "git"
        },
        "credential": {
            "id": 3,
            "name": "REDACTED: SCM SSH Key",
            "description": "",
            "kind": "scm",
            "cloud": false,
            "kubernetes": false,
            "credential_type_id": 2
        },
        "unified_job_template": {
            "id": 6,
            "name": "Shared Ansible Repo",
            "description": "",
            "unified_job_type": "project_update"
        },
        "user_capabilities": {
            "delete": true,
            "start": true
        }
    },
    "created": "2021-01-06T18:51:02.670042Z",
    "modified": "2021-01-06T18:51:02.670067Z",
    "name": "Shared Ansible Repo",
    "description": "",
    "local_path": "_6__shared_ansible_repo",
    "scm_type": "git",
    "scm_url": "REDACTED",
    "scm_branch": "hkatz/awx-example-and-docs",
    "scm_refspec": "",
    "scm_clean": true,
    "scm_delete_on_update": true,
    "credential": 3,
    "timeout": 0,
    "scm_revision": "",
    "unified_job_template": 6,
    "launch_type": "sync",
    "status": "failed",
    "failed": true,
    "started": "2021-01-06T18:51:02.669748Z",
    "finished": "2021-01-06T18:51:18.891607Z",
    "canceled_on": null,
    "elapsed": 16.222,
    "job_args": "[\"bwrap\", \"--die-with-parent\", \"--unshare-pid\", \"--dev-bind\", \"/\", \"/\", \"--proc\", \"/proc\", \"--bind\", \"/tmp/ansible_runner_pi_gj3fya4u/ansible_runner_pi_n1fvuzu5/tmp8uma0i6d\", \"/etc/ssh\", \"--bind\", \"/tmp/ansible_runner_pi_gj3fya4u/ansible_runner_pi_n1fvuzu5/tmpfeo2ii9y\", \"/etc/tower\", \"--bind\", \"/tmp/ansible_runner_pi_gj3fya4u/ansible_runner_pi_n1fvuzu5/tmpneeb45km\", \"/tmp\", \"--bind\", \"/tmp/ansible_runner_pi_gj3fya4u/ansible_runner_pi_n1fvuzu5/tmpdhmdvx97\", \"/var/lib/awx\", \"--bind\", \"/tmp/ansible_runner_pi_gj3fya4u/ansible_runner_pi_n1fvuzu5/tmpp_3ralon\", \"/var/lib/awx/projects\", \"--bind\", \"/tmp/ansible_runner_pi_gj3fya4u/ansible_runner_pi_n1fvuzu5/tmpf2s9hcwm\", \"/var/log\", \"--ro-bind\", \"/var/lib/awx/venv/ansible\", \"/var/lib/awx/venv/ansible\", \"--ro-bind\", \"/var/lib/awx/venv/awx\", \"/var/lib/awx/venv/awx\", \"--bind\", \"/tmp/awx_354_3v5x8e2s\", \"/tmp/awx_354_3v5x8e2s\", \"--bind\", \"/var/lib/awx/projects\", \"/var/lib/awx/projects\", \"--chdir\", \"/tmp/awx_354_3v5x8e2s/project\", \"ssh-agent\", \"sh\", \"-c\", \"ssh-add /tmp/awx_354_3v5x8e2s/artifacts/354/ssh_key_data && rm -f /tmp/awx_354_3v5x8e2s/artifacts/354/ssh_key_data && ansible-playbook -vvv -t update_git,install_roles,install_collections -i /tmp/awx_354_3v5x8e2s/inventory/hosts -e @/tmp/awx_354_3v5x8e2s/env/extravars project_update.yml\"]",
    "job_cwd": "/tmp/awx_354_3v5x8e2s/project",
    "job_env": {
        // edited to remove kubernetes pod envars (port and host information)
        "LC_ALL": "en_US.UTF-8",
        "SUPERVISOR_WEB_CONFIG_PATH": "/etc/supervisord.conf",
        "LANG": "en_US.UTF-8",
        "HOSTNAME": "awx-cbb96dbfd-n7f7c",
        "AWX_SKIP_MIGRATIONS": "1",
        "PWD": "/",
        "HOME": "/var/lib/awx",
        "MY_POD_UID": "4f723f3e-7389-491c-ba78-b40f0aeda5d0",
        "MY_POD_IP": "REDACTED",
        "SHLVL": "1",
        "LANGUAGE": "en_US.UTF-8",
        "PATH": "/var/lib/awx/venv/ansible/bin:/usr/pgsql-10/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "_": "/usr/local/bin/supervisord",
        "SUPERVISOR_ENABLED": "1",
        "SUPERVISOR_SERVER_URL": "unix:///var/run/supervisor/supervisor.sock",
        "SUPERVISOR_PROCESS_NAME": "dispatcher",
        "SUPERVISOR_GROUP_NAME": "tower-processes",
        "LC_CTYPE": "en_US.UTF-8",
        "DJANGO_SETTINGS_MODULE": "awx.settings.production",
        "DJANGO_LIVE_TEST_SERVER_ADDRESS": "localhost:9013-9199",
        "TZ": "UTC",
        "ANSIBLE_FACT_CACHE_TIMEOUT": "0",
        "ANSIBLE_FORCE_COLOR": "True",
        "ANSIBLE_HOST_KEY_CHECKING": "False",
        "ANSIBLE_INVENTORY_UNPARSED_FAILED": "True",
        "ANSIBLE_PARAMIKO_RECORD_HOST_KEYS": "False",
        "ANSIBLE_VENV_PATH": "/var/lib/awx/venv/ansible",
        "PROOT_TMP_DIR": "/tmp",
        "AWX_PRIVATE_DATA_DIR": "/tmp/awx_354_3v5x8e2s",
        "VIRTUAL_ENV": "/var/lib/awx/venv/ansible",
        "PYTHONPATH": "/var/lib/awx/venv/ansible/lib/python3.6/site-packages:",
        "ANSIBLE_RETRY_FILES_ENABLED": "False",
        "ANSIBLE_ASK_PASS": "False",
        "ANSIBLE_BECOME_ASK_PASS": "False",
        "DISPLAY": "",
        "TMP": "/tmp",
        "PROJECT_UPDATE_ID": "354",
        "ANSIBLE_CALLBACK_PLUGINS": "/var/lib/awx/venv/awx/lib/python3.6/site-packages/ansible_runner/callbacks",
        "ANSIBLE_STDOUT_CALLBACK": "awx_display",
        "AWX_ISOLATED_DATA_DIR": "/tmp/awx_354_3v5x8e2s/artifacts/354",
        "RUNNER_OMIT_EVENTS": "False",
        "RUNNER_ONLY_FAILED_EVENTS": "False"
    },
    "job_explanation": "",
    "execution_node": "awx-cbb96dbfd-n7f7c",
    "result_traceback": "",
    "event_processing_finished": true,
    "project": 6,
    "job_type": "run",
    "job_tags": "update_git,install_roles,install_collections",
    "host_status_counts": {
        "failures": 1
    },
    "playbook_counts": {
        "play_count": 1,
        "task_count": 1
    }
}
azrdev commented 3 years ago

I think I'm seeing the same behaviour on AWX 19.2.0 -- Though the error message has rc=None:


Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1289, in run
    self.pre_run_hook(self.instance, private_data_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1781, in pre_run_hook
    sync_task.run(local_project_sync.id)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 759, in _wrapped
    return f(self, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1433, in run
    raise AwxTaskError.TaskError(self.instance, rc)
Exception: project_update 468 (failed) encountered an error (rc=None), please see task stdout for details.
azrdev commented 3 years ago

Turns out this was my (user) error, I specified a branch override that didn't exist (in that repo).

johan-fk commented 2 years ago

I made the same misstake and specified a non existing branch. I feel that the error message could be improved.

jxc876 commented 1 year ago

Same error was due to a bad git branch name, the error only showed up when I added a second control plane node.

Turns out the first control plane already had the git branch cached, when the second tried to pull it didn't exist.

Once I corrected the branch name it got fixed.