ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.26k stars 632 forks source link

Project Sync has started failing on default Control Plane Execution Environment after reapplying the AWX-operator and AWX CRD in the cluster. However, certain tasks, such as ping or win_ping, are still functioning correctly #1882

Open mdfarmank opened 5 months ago

mdfarmank commented 5 months ago

Please confirm the following

Bug Summary

Recently I reapplied AWX-operator and AWX CRD in the K3s cluster (without any config changes), after which Project Sync has started failing on Control Plane Execution Environment, However, certain other tasks like ping or win_ping, are functioning correctly. This setup was working just fine since last few months.

For example - Whenever I run AWX Project sync or Demo project sync, it fails with following error:

  "module_stdout": "",
  "module_stderr": "",
  "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
  "rc": 137,
  "_ansible_no_log": false,
  "changed": false
}

Following is the complete output response for project sync job:

PLAY [Update source tree if necessary] *****************************************
TASK [Update project using git] ************************************************
task path: /tmp/awx_207403_ytruvr2q/project/project_update.yml:41
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: 1000
<127.0.0.1> EXEC /bin/sh -c 'echo ~1000 && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /runner/.ansible/tmp `"&& mkdir "` echo /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406 `" && echo ansible-tmp-1716955683.6936572-188-118617953611406="` echo /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406 `" ) && sleep 0'
Using module file /usr/local/lib/python3.9/site-packages/ansible/modules/git.py
<127.0.0.1> PUT /runner/.ansible/tmp/ansible-local-1849x1mepar/tmpyl6_jvm_ TO /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/AnsiballZ_git.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/ /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/AnsiballZ_git.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python3 /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/AnsiballZ_git.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/ > /dev/null 2>&1 && sleep 0'
fatal: [localhost]: FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 137
}
PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Following is the project sync error response when running Project sync with Clean and Delete options:

    "changed": false,
    "changed_when_result": "The conditional check 'reg.stdout_lines | length > 1' failed. The error was: error while evaluating conditional (reg.stdout_lines | length > 1): 'dict object' has no attribute 'stdout_lines'. 'dict object' has no attribute 'stdout_lines'",
    "module_stderr": "",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 137
}

AWX Operator version

2.15.0

AWX version

24.2.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

k3s version v1.25.5+k3s1 (48e5d2af)

Modifications

no

Steps to reproduce

Installation is done with kustomization.yaml file with external/unmanaged postgres database as follows:

kind: Kustomization
resources:
  - github.com/ansible/awx-operator/config/default?ref=2.15.0
  - configs/awx-secrets.yaml
  - modules/awx-deploy.yaml

images:
  - name: quay.io/ansible/awx-operator
    newTag: 2.15.0

namespace: awx

awx-deploy.yaml manifest

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: abc-awx
  namespace: awx
spec:
  replicas: 1
  admin_user: admin
  admin_password_secret: abc-awx-admin-password
  secret_key_secret: abc-awx-secret-key
  task_privileged: true

  service_type: ClusterIP
  postgres_configuration_secret: abc-awx-postgres-configuration

  csrf_cookie_secure: 'True'
  session_cookie_secure: 'True'

  web_resource_requirements:
    requests:
      cpu: 200m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 2Gi
  task_resource_requirements:
    requests:
      cpu: 200m
      memory: 1Gi
    limits:
      cpu: 500m
      memory: 2Gi
  ee_resource_requirements:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 250m
      memory: 1Gi

Run kubectl apply -k .

Expected results

Demo Project sync job runs with success in AWX

Actual results

Demo Project sync job failed in AWX, with following results:

Job execution result

PLAY [Update source tree if necessary] *****************************************
TASK [Update project using git] ************************************************
task path: /tmp/awx_207403_ytruvr2q/project/project_update.yml:41
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: 1000
<127.0.0.1> EXEC /bin/sh -c 'echo ~1000 && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /runner/.ansible/tmp `"&& mkdir "` echo /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406 `" && echo ansible-tmp-1716955683.6936572-188-118617953611406="` echo /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406 `" ) && sleep 0'
Using module file /usr/local/lib/python3.9/site-packages/ansible/modules/git.py
<127.0.0.1> PUT /runner/.ansible/tmp/ansible-local-1849x1mepar/tmpyl6_jvm_ TO /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/AnsiballZ_git.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/ /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/AnsiballZ_git.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python3 /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/AnsiballZ_git.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /runner/.ansible/tmp/ansible-tmp-1716955683.6936572-188-118617953611406/ > /dev/null 2>&1 && sleep 0'
fatal: [localhost]: FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 137
}
PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0`

**Job execution result when Clean and Delete options are enabled in the Project**

`fatal: [localhost]: FAILED! => {
    "changed": false,
    "changed_when_result": "The conditional check 'reg.stdout_lines | length > 1' failed. The error was: error while evaluating conditional (reg.stdout_lines | length > 1): 'dict object' has no attribute 'stdout_lines'. 'dict object' has no attribute 'stdout_lines'",
    "module_stderr": "",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 137
}

Additional information

No response

Operator Logs

AWX task pod logs

2024-05-29 05:50:44,905 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.scheduler Finished dependency_manager Scheduler, timing data:
{'get_tasks_seconds': 0.03475628200249048, 'generate_dependencies_seconds': 0, '_schedule_seconds': 0.034768084005918354, '_schedule_calls': 0, 'recorded_timestamp': 0, 'pending_processed': 0}
2024-05-29 05:50:45,809 DEBUG    [-] awx.main.wsrelay Web host abc-awx-web-5996c54f9b-bk8kh (10.42.1.9) online heartbeat received.
2024-05-29 05:50:45,812 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch.periodic scheduler found k8s_reaper to run, 0.003035306930541992 seconds after target
2024-05-29 05:50:45,813 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch.periodic Scheduler next run is receptor_reaper in 1.9963040351867676 seconds
2024-05-29 05:50:45,814 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch task 28e72180-d9fb-435c-8398-c15a162980ea starting awx.main.tasks.system.awx_k8s_reaper(*[])
2024-05-29 05:50:45,855 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.tasks.system Checking for orphaned k8s pods for default-3.
2024-05-29 05:50:45,914 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.tasks.system Checking for orphaned k8s pods for intelligenibots-askml-4.
2024-05-29 05:50:47,820 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch.periodic scheduler found receptor_reaper to run, 0.011101245880126953 seconds after target
2024-05-29 05:50:47,821 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch.periodic Scheduler next run is send_subsystem_metrics in 0.987910270690918 seconds
2024-05-29 05:50:47,823 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch task 5c6ba994-14c1-41fe-b834-f04606c0756c starting awx.main.tasks.system.awx_receptor_workunit_reaper(*[])
2024-05-29 05:50:47,825 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.tasks.system Checking for unreleased receptor work units
2024-05-29 05:50:48,815 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch.periodic scheduler found send_subsystem_metrics to run, 0.0057506561279296875 seconds after target
2024-05-29 05:50:48,815 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch.periodic Scheduler next run is pool_cleanup in 5.993627071380615 seconds
2024-05-29 05:50:48,817 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.dispatch task 9f1930ad-3501-40fe-8a79-ee80670a6444 starting awx.main.analytics.analytics_tasks.send_subsystem_metrics(*[])
2024-05-29 05:50:50,296 INFO     [f45900d372a74811b3839991111a3484] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 207512
2024-05-29 05:50:50,303 DEBUG    [f45900d372a74811b3839991111a3484] awx.main.tasks.jobs project_update 207512 (running) finished running, producing 41 events.
2024-05-29 05:50:50,306 INFO     [f45900d372a74811b3839991111a3484] awx.analytics.job_lifecycle projectupdate-207512 post run {"type": "projectupdate", "task_id": 207512, "state": "post_run", "work_unit_id": "4aYaAoAG", "task_name": "Microbots"}
2024-05-29 05:50:50,533 INFO     [f45900d372a74811b3839991111a3484] awx.analytics.job_lifecycle projectupdate-207512 finalize run {"type": "projectupdate", "task_id": 207512, "state": "finalize_run", "work_unit_id": "4aYaAoAG", "task_name": "Microbots"}
2024-05-29 05:50:50,541 WARNING  [f45900d372a74811b3839991111a3484] awx.main.dispatch project_update 207512 (failed) encountered an error (rc=None), please see task stdout for details.
2024-05-29 05:50:51,306 INFO     [-] awx.analytics.job_lifecycle projectupdate-207512 stats wrapup finished {"type": "projectupdate", "task_id": 207512, "state": "stats_wrapup_finished", "work_unit_id": "4aYaAoAG", "task_name": "Microbots"}

awx-operator pod logs


 TASK [Remove ownerReferences reference] ********************************
ok: [localhost] => (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}

-------------------------------------------------------------------------------
{"level":"info","ts":"2024-05-29T03:06:23Z","logger":"logging_event_handler","msg":"[playbook task start]","name":"abc-awx","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"2090315677743390150","EventData.Name":"installer : Start installation if auto_upgrade is false and deployment is missing"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [installer : Start installation if auto_upgrade is false and deployment is missing] ***
task path: /opt/ansible/roles/installer/tasks/main.yml:31

-------------------------------------------------------------------------------
{"level":"info","ts":"2024-05-29T03:06:23Z","logger":"runner","msg":"Ansible-runner exited successfully","job":"2090315677743390150","name":"abc-awx","namespace":"awx"}

----- Ansible Task Status Event StdOut (awx.ansible.com/v1beta1, Kind=AWX, abc-awx/awx) -----

PLAY RECAP *********************************************************************
localhost                  : ok=89   changed=0    unreachable=0    failed=0    skipped=83   rescued=0    ignored=1
mdfarmank commented 5 months ago

Hi Team, can I please get a response on this?

mdfarmank commented 3 months ago

Any response team? The issue still persists. Any help would be highly appreciated.