ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
13.67k stars 3.37k forks source link

Ansible awx web, operator and task pods doenst work correctly #15301

Open Arjunasvr opened 1 week ago

Arjunasvr commented 1 week ago

Please confirm the following

Bug Summary

I tried upgrading to version 2.19.0, but the task en web pods doesnt exist anymore. I cannot access the web anymore. In minikube I cannot see that the pods are running. They just vanished. Also when I try to downgrade to 2.12.0 the task container doesnt work anymore. Can someone pls assist me in getting awx up and running again.

AWX version

operator 2.19.0

Select the relevant components

Installation method

minikube

Modifications

no

Ansible version

No response

Operating system

ubuntu 22.04 lts

Web browser

Firefox, Chrome, Safari, Edge

Steps to reproduce

upgrade to awx 2.19.0 and wait

Expected results

Awx UI will be shown and container such as the task and web are running

Actual results

The task and web container is not running and not showing in the namespace for the pods.

Additional information

No response

cnfrancis commented 1 week ago

to confirm, you have the operator running within the same namespace right?

Arjunasvr commented 1 week ago

to confirm, you have the operator running within the same namespace right?

Yes it is.

mandeepmails commented 1 week ago

@Arjunasvr Could you share events for the pod related to awx-task-XXXXXXXX

kubectl -n awx describe pod awx-task-XXXXXXXX

I suspect your pvc is pointing to the un-shareable volume and getting deleted.

gleupold commented 1 week ago

Hey @Arjunasvr , we encountered an issue that could help you. In our scenario the configs (crds) werent updated and the 'web_manage_replicas' was undefined. There are logs within the operator while upgrading where you can find this error. TASK [Apply deployment resources] ******************************** fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined\n\nThe error appears to be in '/opt/ansible/roles/installer/tasks/resources_configuration.yml': line 248, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Apply deployment resources\n ^ here\n"} After we executed: kubectl apply --server-side -k "github.com/ansible/awx-operator/config/crd?ref=2.19.0" The migration started right away. See also: https://github.com/ansible/awx-operator/commit/8ead140541622f67bd2d44a3c76bb05739cdebb6#diff-8230d07440a5d33c9608211b63791ef41f935652ca8b8ec3d9f3c68b5ed8cc98

Arjunasvr commented 6 days ago

@Arjunasvr Could you share events for the pod related to awx-task-XXXXXXXX

kubectl -n awx describe pod awx-task-XXXXXXXX

I suspect your pvc is pointing to the un-shareable volume and getting deleted.

I am sorry I cant do this because there is no awx-task pod

Arjunasvr commented 6 days ago

Hey @Arjunasvr , we encountered an issue that could help you. In our scenario the configs (crds) werent updated and the 'web_manage_replicas' was undefined. There are logs within the operator while upgrading where you can find this error. TASK [Apply deployment resources] ******************************** fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined. 'web_manage_replicas' is undefined\n\nThe error appears to be in '/opt/ansible/roles/installer/tasks/resources_configuration.yml': line 248, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Apply deployment resources\n ^ here\n"} After we executed: kubectl apply --server-side -k "github.com/ansible/awx-operator/config/crd?ref=2.19.0" The migration started right away. See also: ansible/awx-operator@8ead140#diff-8230d07440a5d33c9608211b63791ef41f935652ca8b8ec3d9f3c68b5ed8cc98

Hi I tried this and it didnt work I checked some logging from the operator pod and saw this error:

5788921982606687203","name":"awx-server","namespace":"awx","error":"exit status 2","stacktrace":"github.com/operator-framework/ansible-operator-plugins/internal/ansible/runner.(*runner).Run.func1\n\tansible-operator-plugins/internal/ansible/runner/runner.go:269"}

And also I saw this:

ASK [installer : Stream backup from pg_dump to the new postgresql container] *** task path: /opt/ansible/roles/installer/tasks/upgrade_postgres.yml:99


{"level":"info","ts":"2024-07-02T06:55:23Z","logger":"logging_event_handler","msg":"[playbook task start]","name":"awx-server","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"231178893729865755","EventData.Name":"installer : Stream backup from pg_dump to the new postgresql container"} {"level":"info","ts":"2024-07-02T06:55:23Z","logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/api/v1/namespaces/awx/pods/awx-server-postgres-15-0","Verb":"get","APIPrefix":"api","APIGroup":"","APIVersion":"v1","Namespace":"awx","Resource":"pods","Subresource":"","Name":"awx-server-postgres-15-0","Parts":["pods","awx-server-postgres-15-0"]}}

--------------------------- Ansible Task StdOut -------------------------------

TASK [Stream backup from pg_dump to the new postgresql container] **** fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}

Someone a new idea?

mandeepmails commented 6 days ago

@Arjunasvr Could you share events for the pod related to awx-task-XXXXXXXX kubectl -n awx describe pod awx-task-XXXXXXXX I suspect your pvc is pointing to the un-shareable volume and getting deleted.

I am sorry I cant do this because there is no awx-task pod

was it on Minikube ? or limited hardware setup ?

I can tell usual behavior, even if it's normal (not minimal) hardware with k8s, it usually takes between 40-60 minutes for the aws-task-XXXXXXXXX pods to appear. feel free to try on another hardware. good luck

Arjunasvr commented 6 days ago

@Arjunasvr Could you share events for the pod related to awx-task-XXXXXXXX kubectl -n awx describe pod awx-task-XXXXXXXX I suspect your pvc is pointing to the un-shareable volume and getting deleted.

I am sorry I cant do this because there is no awx-task pod

was it on Minikube ? or limited hardware setup ?

I can tell usual behavior, even if it's normal (not minimal) hardware with k8s, it usually takes between 40-60 minutes for the aws-task-XXXXXXXXX pods to appear. feel free to try on another hardware. good luck

It was on minikube indeed. Normally the awx-task-xxx pod spins up in 5/10 minutes. I even had the upgrade on more than 2 days and even then the task and web wouldnt show when I execute kubectl get pods -n awx

fosterseth commented 5 days ago

@Arjunasvr can you set no_log: False in your awx spec? that way the operator shows more details of what is failing.

Arjunasvr commented 4 days ago

@Arjunasvr can you set no_log: False in your awx spec? that way the operator shows more details of what is failing.

Hi @fosterseth I did, no change in the pod log getting still the same errors