ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.26k stars 633 forks source link

After upgrade "Kind=PodList err-Index with name field:status.phase does not exist" #1022

Closed nscblauensteiner closed 2 years ago

nscblauensteiner commented 2 years ago
ISSUE TYPE

Bug report

SUMMARY

Attempting to upgrade from AWX 21.3.0 to 21.4.0

ENVIRONMENT
STEPS TO REPRODUCE

Upgrade operator 0.24.0 to 0.26.0 with "make deploy"

EXPECTED RESULTS

Upgrade with "make deploy" to be successful

ACTUAL RESULTS

Error message in controller-manager "kubectl logs -f deployments/awx-operator-controller-manager -c awx-manager":

TASK [installer : Get new postgres pod information] **************************** task path: /opt/ansible/roles/installer/tasks/upgrade_postgres.yml:45 {"level":"info","ts":1660293743.5016606,"logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"4762220260129429432","EventData.Name":"installer : Get new postgres pod information"} {"level":"info","ts":1660293744.2908022,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}

akus062381 commented 2 years ago

@rooftopcellist is this possibly related to the Ansible Operator SDK upgrade?

aimcod commented 2 years ago

@rooftopcellist is this possibly related to the Ansible Operator SDK upgrade?

I am deploying AWX on a fresh kubernetes cluster and I just go the same: image

ghost commented 2 years ago

I have similar error when trying to deploy AWX on fresh OCI Kubernetes environment


--------------------------- Ansible Task StdOut -------------------------------

TASK [installer : Get the postgres pod information] ****************************
task path: /opt/ansible/roles/installer/tasks/database_configuration.yml:196

-------------------------------------------------------------------------------
{"level":"info","ts":1661331036.1270256,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":1661331036.2232614,"logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"418623429143691413","EventData.Name":"installer : Wait for Database to initialize if managed DB"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [installer : Wait for Database to initialize if managed DB] ***************
task path: /opt/ansible/roles/installer/tasks/database_configuration.yml:206

-------------------------------------------------------------------------------
{"level":"info","ts":1661331036.8066926,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":1661331042.4789498,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":1661331048.105519,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":1661331053.7363572,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":1661331059.3696618,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":1661331065.0136898,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
{"level":"info","ts":1661331070.642434,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}
rooftopcellist commented 2 years ago

@Hermanni93 I am not seeing this on fresh installs from the devel branch (just pulled latest).

My db initialize task passed after 2 cache misses, which is normal because it takes some time for the pod to become available.

Logs paste (expand) ###### Logs from awx-operator container, vanilla AWX CR ``` TASK [installer : Wait for Database to initialize if managed DB] *************** task path: /opt/ansible/roles/installer/tasks/database_configuration.yml:206 ------------------------------------------------------------------------------- {"level":"info","ts":1661443521.696671,"logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"ca-awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"5420764487062725234","EventData.Name":"installer : Wait for Database to initialize if managed DB"} {"level":"info","ts":1661443522.4634442,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"} {"level":"info","ts":1661443528.3991892,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"} --------------------------- Ansible Task StdOut ------------------------------- TASK [installer : Look up details for this deployment] ************************* task path: /opt/ansible/roles/installer/tasks/database_configuration.yml:223 ------------------------------------------------------------------------------- {"level":"info","ts":1661443528.5961928,"logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"ca-awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"5420764487062725234","EventData.Name":"installer : Look up details for this deployment"} {"level":"info","ts":1661443529.652844,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/awx.ansible.com/v1beta1/namespaces/ca-awx/awxs/awx","Verb":"get","APIPrefix":"apis","APIGroup":"awx.ansible.com","APIVersion":"v1beta1","Namespace":"ca-awx","Resource":"awxs","Subresource":"","Name":"awx","Parts":["awxs","awx"]}} ``` **
** @aimcod @akus062381 The Operator SDK work only landed 3 days ago and this issue was created 13 days ago, so the timeline doesn't fit. Also, fresh installs should never enter `upgrade_postgres.yml`. I wonder if there is an old PVC hanging around in the namespace you have deployed in with the same name as the one being requested by the new postgres pod. If that were the case, I would expect the PVC to be stuck in the pending state, and the postgres pod wouldn't be available, thus causing the cache-miss you are seeing. Could you check your pvc's? `kubectl get pvc -n `
rooftopcellist commented 2 years ago

I just just the following to try to reproduce:

  1. checked out 0.24.0, deployed the awx-operator (make deploy) and created an AWX CR.
  2. populated some dummy resources in the AWX UI
  3. checked out 0.26.0, deployed the awx-operator (make deploy)
  4. watched the logs to see any errors or see if it was hanging on any tasks for too long
  5. observed that the reconciliation loop converged/stopped running.
    $ oc logs deployments/awx-operator-controller-manager -c awx-manager -f

I similarly upgraded to devel from there without issue following the same process.

@nscblauensteiner after reading the issue again, I see that you saw this issue going from 0.23.0 --> 0.24.0, I expect that was a transient error that has since been fixed. Could you try upgrading to the new 0.28.0 release and comment here if you still have issues?

nscblauensteiner commented 2 years ago

@nscblauensteiner after reading the issue again, I see that you saw this issue going from 0.23.0 --> 0.24.0, I expect that was a transient error that has since been fixed. Could you try upgrading to the new 0.28.0 release and comment here if you still have issues?

@rooftopcellist - Sorry for the late reply. Going from 0.24.0 to 0.28.0 the following error occurs:

`TASK [installer : Create Database if no database is specified] ***** task path: /opt/ansible/roles/installer/tasks/upgrade_postgres.yml:33


{"level":"info","ts":1661953099.2240002,"logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"2700876882654434590","EventData.Name":"installer : Create Database if no database is specified"} {"level":"info","ts":1661953100.0569692,"logger":"proxy","msg":"Cache miss: apps/v1, Kind=StatefulSet, awx/awx-postgres-13"} {"level":"info","ts":1661953100.0622494,"logger":"proxy","msg":"Cache miss: apps/v1, Kind=StatefulSet, awx/awx-postgres-13"} {"level":"info","ts":1661953100.0664117,"logger":"proxy","msg":"Injecting owner reference"} {"level":"info","ts":1661953100.066802,"logger":"proxy","msg":"Watching child resource","kind":"apps/v1, Kind=StatefulSet","enqueue_kind":"awx.ansible.com/v1beta1, Kind=AWX"} {"level":"info","ts":1661953100.0668259,"msg":"Starting EventSource","controller":"awx-controller","source":"kind source: *unstructured.Unstructured"} {"level":"info","ts":1661953100.0775096,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953105.0847545,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953110.0923655,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953115.1000066,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953120.107237,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953125.1145475,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953130.1218889,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953135.1291647,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953140.1358783,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953145.1520195,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953150.1593595,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953155.1630697,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953160.169816,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953165.1771474,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953170.184331,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953175.1910224,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953180.198264,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953185.2055795,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953190.2112143,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953195.218038,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953200.2255187,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953205.2327266,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953210.2398424,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}} {"level":"info","ts":1661953215.2457268,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/awx/statefulsets/awx-postgres-13","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"awx","Resource":"statefulsets","Subresource":"","Name":"awx-postgres-13","Parts":["statefulsets","awx-postgres-13"]}}

--------------------------- Ansible Task StdOut -------------------------------

TASK [Create Database if no database is specified] ** fatal: [localhost]: FAILED! => {"changed": true, "duration": 120, "method": "apply", "msg": "StatefulSet awx-postgres-13:** Resource apply timed out", "result": {"apiVersion": "apps/v1", "kind": "StatefulSet", "metadata": {"annotations": {"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"apps/v1\",\"kind\":\"StatefulSet\",\"metadata\":{\"labels\":{\"app.kubernetes.io/component\":\"database\",\"app.kubernetes.io/instance\":\"postgres-13-awx\",\"app.kubernetes.io/managed-by\":\"awx-operator\",\"app.kubernetes.io/name\":\"postgres-13\",\"app.kubernetes.io/operator-version\":\"0.28.0\",\"app.kubernetes.io/part-of\":\"awx\"},\"name\":\"awx-postgres-13\",\"namespace\":\"awx\"},\"spec\":{\"replicas\":1,\"selector\":{\"matchLabels\":{\"app.kubernetes.io/component\":\"database\",\"app.kubernetes.io/instance\":\"postgres-13-awx\",\"app.kubernetes.io/managed-by\":\"awx-operator\",\"app.kubernetes.io/name\":\"postgres-13\"}},\"serviceName\":\"awx\",\"template\":{\"metadata\":{\"labels\":{\"app.kubernetes.io/component\":\"database\",\"app.kubernetes.io/instance\":\"postgres-13-awx\",\"app.kubernetes.io/managed-by\":\"awx-operator\",\"app.kubernetes.io/name\":\"postgres-13\",\"app.kubernetes.io/part-of\":\"awx\"}},\"spec\":{\"containers\":[{\"env\":[{\"name\":\"POSTGRESQL_DATABASE\",\"valueFrom\":{\"secretKeyRef\":{\"key\":\"database\",\"name\":\"awx-postgres-configuration\"}}},{\"name\":\"POSTGRESQL_USER\",\"valueFrom\":{\"secretKeyRef\":{\"key\":\"username\",\"name\":\"awx-postgres-configuration\"}}},{\"name\":\"POSTGRESQL_PASSWORD\",\"valueFrom\":{\"secretKeyRef\":{\"key\":\"password\",\"name\":\"awx-postgres-configuration\"}}},{\"name\":\"POSTGRES_DB\",\"valueFrom\":{\"secretKeyRef\":{\"key\":\"database\",\"name\":\"awx-postgres-configuration\"}}},{\"name\":\"POSTGRES_USER\",\"valueFrom\":{\"secretKeyRef\":{\"key\":\"username\",\"name\":\"awx-postgres-configuration\"}}},{\"name\":\"POSTGRES_PASSWORD\",\"valueFrom\":{\"secretKeyRef\":{\"key\":\"password\",\"name\":\"awx-postgres-configuration\"}}},{\"name\":\"PGDATA\",\"value\":\"/var/lib/postgresql/data/pgdata\"},{\"name\":\"POSTGRES_INITDB_ARGS\",\"value\":\"--auth-host=scram-sha-256\"},{\"name\":\"POSTGRES_HOST_AUTH_METHOD\",\"value\":\"scram-sha-256\"}],\"image\":\"postgres:13\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"postgres\",\"ports\":[{\"containerPort\":5432,\"name\":\"postgres-13\"}],\"resources\":{\"requests\":{\"cpu\":\"10m\",\"memory\":\"64Mi\"}},\"volumeMounts\":[{\"mountPath\":\"/var/lib/postgresql/data\",\"name\":\"postgres-13\",\"subPath\":\"data\"}]}],\"priorityClassName\":\"\"}},\"updateStrategy\":{\"type\":\"RollingUpdate\"},\"volumeClaimTemplates\":[{\"metadata\":{\"name\":\"postgres-13\"},\"spec\":{\"accessModes\":[\"ReadWriteOnce\"],\"resources\":{\"requests\":{\"storage\":\"8Gi\"}}}}]}}"}, "creationTimestamp": "2022-08-31T13:38:20Z", "generation": 1, "labels": {"app.kubernetes.io/component": "database", "app.kubernetes.io/instance": "postgres-13-awx", "app.kubernetes.io/managed-by": "awx-operator", "app.kubernetes.io/name": "postgres-13", "app.kubernetes.io/operator-version": "0.28.0", "app.kubernetes.io/part-of": "awx"}, "managedFields": [{"apiVersion": "apps/v1", "fieldsType": "FieldsV1", "fieldsV1": {"f:metadata": {"f:annotations": {".": {}, "f:kubectl.kubernetes.io/last-applied-configuration": {}}, "f:labels": {".": {}, "f:app.kubernetes.io/component": {}, "f:app.kubernetes.io/instance": {}, "f:app.kubernetes.io/managed-by": {}, "f:app.kubernetes.io/name": {}, "f:app.kubernetes.io/operator-version": {}, "f:app.kubernetes.io/part-of": {}}, "f:ownerReferences": {".": {}, "k:{\"uid\":\"832c34b8-174b-47cd-99cd-70228dec23e0\"}": {}}}, "f:spec": {"f:podManagementPolicy": {}, "f:replicas": {}, "f:revisionHistoryLimit": {}, "f:selector": {}, "f:serviceName": {}, "f:template": {"f:metadata": {"f:labels": {".": {}, "f:app.kubernetes.io/component": {}, "f:app.kubernetes.io/instance": {}, "f:app.kubernetes.io/managed-by": {}, "f:app.kubernetes.io/name": {}, "f:app.kubernetes.io/part-of": {}}}, "f:spec": {"f:containers": {"k:{\"name\":\"postgres\"}": {".": {}, "f:env": {".": {}, "k:{\"name\":\"PGDATA\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"POSTGRESQL_DATABASE\"}": {".": {}, "f:name": {}, "f:valueFrom": {".": {}, "f:secretKeyRef": {}}}, "k:{\"name\":\"POSTGRESQL_PASSWORD\"}": {".": {}, "f:name": {}, "f:valueFrom": {".": {}, "f:secretKeyRef": {}}}, "k:{\"name\":\"POSTGRESQL_USER\"}": {".": {}, "f:name": {}, "f:valueFrom": {".": {}, "f:secretKeyRef": {}}}, "k:{\"name\":\"POSTGRES_DB\"}": {".": {}, "f:name": {}, "f:valueFrom": {".": {}, "f:secretKeyRef": {}}}, "k:{\"name\":\"POSTGRES_HOST_AUTH_METHOD\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"POSTGRES_INITDB_ARGS\"}": {".": {}, "f:name": {}, "f:value": {}}, "k:{\"name\":\"POSTGRES_PASSWORD\"}": {".": {}, "f:name": {}, "f:valueFrom": {".": {}, "f:secretKeyRef": {}}}, "k:{\"name\":\"POSTGRES_USER\"}": {".": {}, "f:name": {}, "f:valueFrom": {".": {}, "f:secretKeyRef": {}}}}, "f:image": {}, "f:imagePullPolicy": {}, "f:name": {}, "f:ports": {".": {}, "k:{\"containerPort\":5432,\"protocol\":\"TCP\"}": {".": {}, "f:containerPort": {}, "f:name": {}, "f:protocol": {}}}, "f:resources": {".": {}, "f:requests": {".": {}, "f:cpu": {}, "f:memory": {}}}, "f:terminationMessagePath": {}, "f:terminationMessagePolicy": {}, "f:volumeMounts": {".": {}, "k:{\"mountPath\":\"/var/lib/postgresql/data\"}": {".": {}, "f:mountPath": {}, "f:name": {}, "f:subPath": {}}}}}, "f:dnsPolicy": {}, "f:restartPolicy": {}, "f:schedulerName": {}, "f:securityContext": {}, "f:terminationGracePeriodSeconds": {}}}, "f:updateStrategy": {"f:type": {}}, "f:volumeClaimTemplates": {}}}, "manager": "OpenAPI-Generator", "operation": "Update", "time": "2022-08-31T13:38:20Z"}, {"apiVersion": "apps/v1", "fieldsType": "FieldsV1", "fieldsV1": {"f:status": {"f:collisionCount": {}, "f:currentReplicas": {}, "f:currentRevision": {}, "f:observedGeneration": {}, "f:replicas": {}, "f:updateRevision": {}, "f:updatedReplicas": {}}}, "manager": "kube-controller-manager", "operation": "Update", "subresource": "status", "time": "2022-08-31T13:38:20Z"}], "name": "awx-postgres-13", "namespace": "awx", "ownerReferences": [{"apiVersion": "awx.ansible.com/v1beta1", "kind": "AWX", "name": "awx", "uid": "832c34b8-174b-47cd-99cd-70228dec23e0"}], "resourceVersion": "27418469", "uid": "bb16e868-0314-43cd-a63a-7858bc463796"}, "spec": {"podManagementPolicy": "OrderedReady", "replicas": 1, "revisionHistoryLimit": 10, "selector": {"matchLabels": {"app.kubernetes.io/component": "database", "app.kubernetes.io/instance": "postgres-13-awx", "app.kubernetes.io/managed-by": "awx-operator", "app.kubernetes.io/name": "postgres-13"}}, "serviceName": "awx", "template": {"metadata": {"creationTimestamp": null, "labels": {"app.kubernetes.io/component": "database", "app.kubernetes.io/instance": "postgres-13-awx", "app.kubernetes.io/managed-by": "awx-operator", "app.kubernetes.io/name": "postgres-13", "app.kubernetes.io/part-of": "awx"}}, "spec": {"containers": [{"env": [{"name": "POSTGRESQL_DATABASE", "valueFrom": {"secretKeyRef": {"key": "database", "name": "awx-postgres-configuration"}}}, {"name": "POSTGRESQL_USER", "valueFrom": {"secretKeyRef": {"key": "username", "name": "awx-postgres-configuration"}}}, {"name": "POSTGRESQL_PASSWORD", "valueFrom": {"secretKeyRef": {"key": "password", "name": "awx-postgres-configuration"}}}, {"name": "POSTGRES_DB", "valueFrom": {"secretKeyRef": {"key": "database", "name": "awx-postgres-configuration"}}}, {"name": "POSTGRES_USER", "valueFrom": {"secretKeyRef": {"key": "username", "name": "awx-postgres-configuration"}}}, {"name": "POSTGRES_PASSWORD", "valueFrom": {"secretKeyRef": {"key": "password", "name": "awx-postgres-configuration"}}}, {"name": "PGDATA", "value": "/var/lib/postgresql/data/pgdata"}, {"name": "POSTGRES_INITDB_ARGS", "value": "--auth-host=scram-sha-256"}, {"name": "POSTGRES_HOST_AUTH_METHOD", "value": "scram-sha-256"}], "image": "postgres:13",{"level":"error","ts":1661953220.3480885,"logger":"logging_event_handler","msg":"","name":"awx","namespace":"awx","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"runner_on_failed","job":"2700876882654434590","EventData.Task":"Create Database if no database is specified","EventData.TaskArgs":"","EventData.FailedTaskPath":"/opt/ansible/roles/installer/tasks/upgrade_postgres.yml:33","error":"[playbook task failed]"} "imagePullPolicy": "IfNotPresent", "name": "postgres", "ports": [{"containerPort": 5432, "name": "postgres-13", "protocol": "TCP"}], "resources": {"requests": {"cpu": "10m", "memory": "64Mi"}}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [{"mountPath": "/var/lib/postgresql/data", "name": "postgres-13", "subPath": "data"}]}], "dnsPolicy": "ClusterFirst", "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "terminationGracePeriodSeconds": 30}}, "updateStrategy": {"type": "RollingUpdate"}, "volumeClaimTemplates": [{"apiVersion": "v1", "kind": "PersistentVolumeClaim", "metadata": {"creationTimestamp": null, "name": "postgres-13"}, "spec": {"accessModes": ["ReadWriteOnce"], "resources": {"requests": {"storage": "8Gi"}}, "volumeMode": "Filesystem"}, "status": {"phase": "Pending"}}]}, "status": {"availableReplicas": 0, "collisionCount": 0, "currentReplicas": 1, "currentRevision": "awx-postgres-13-54b9b564f4", "observedGeneration": 1, "replicas": 1, "updateRevision": "awx-postgres-13-54b9b564f4", "updatedReplicas": 1}}}`

aimcod commented 2 years ago

@Hermanni93 I am not seeing this on fresh installs from the devel branch (just pulled latest).

My db initialize task passed after 2 cache misses, which is normal because it takes some time for the pod to become available.

Logs paste (expand) @aimcod @akus062381 The Operator SDK work only landed 3 days ago and this issue was created 13 days ago, so the timeline doesn't fit. Also, fresh installs should never enter upgrade_postgres.yml.

I wonder if there is an old PVC hanging around in the namespace you have deployed in with the same name as the one being requested by the new postgres pod. If that were the case, I would expect the PVC to be stuck in the pending state, and the postgres pod wouldn't be available, thus causing the cache-miss you are seeing.

Could you check your pvc's? kubectl get pvc -n <deployment-namespace>

Hi @rooftopcellist

This is a fresh deployment of 0.28.0 on a fresh K8s cluster(1.25) that is deployed on fresh VMs.

My issue right now is exactly that of #706 , excluding the last comment, where @indraneeldey1 mentioned it is sort-of running for him.

Here are the details

POD STATUS:

[root@dev-awx-01 k8awx]# kubectl get pods

NAME                                              READY   STATUS    RESTARTS   AGE
awx-operator-controller-manager-9589d9859-jhp4q   2/2     Running   0          8m43s
*****-postgres-13-0                               0/1     Pending   0          8m16s
[root@dev-awx-01 k8awx]# kubectl describe pod ****-postgres-13-0
Name:             ****-postgres-13-0
Namespace:        awx
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app.kubernetes.io/component=database
                  app.kubernetes.io/instance=postgres-13-****
                  app.kubernetes.io/managed-by=awx-operator
                  app.kubernetes.io/name=postgres-13
                  app.kubernetes.io/part-of=****
                  controller-revision-hash=****-postgres-13-8677ccdd5d
                  statefulset.kubernetes.io/pod-name=****-postgres-13-0
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/****-postgres-13
Containers:
  postgres:
    Image:      postgres:13
    Port:       5432/TCP
    Host Port:  0/TCP
    Requests:
      cpu:     10m
      memory:  64Mi
    Environment:
      POSTGRESQL_DATABASE:        <set to the key 'database' in secret '****-postgres-configuration'>  Optional: false
      POSTGRESQL_USER:            <set to the key 'username' in secret '****-postgres-configuration'>  Optional: false
      POSTGRESQL_PASSWORD:        <set to the key 'password' in secret '****-postgres-configuration'>  Optional: false
      POSTGRES_DB:                <set to the key 'database' in secret '****-postgres-configuration'>  Optional: false
      POSTGRES_USER:              <set to the key 'username' in secret '****-postgres-configuration'>  Optional: false
      POSTGRES_PASSWORD:          <set to the key 'password' in secret '****-postgres-configuration'>  Optional: false
      PGDATA:                     /var/lib/postgresql/data/pgdata
      POSTGRES_INITDB_ARGS:       --auth-host=scram-sha-256
      POSTGRES_HOST_AUTH_METHOD:  scram-sha-256
    Mounts:
      /var/lib/postgresql/data from postgres-13 (rw,path="data")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zbw28 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  postgres-13:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  postgres-13-****-postgres-13-0
    ReadOnly:   false
  kube-api-access-zbw28:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  9m10s  default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 node(s) didn't find available persistent volumes to bind. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  4m1s   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 node(s) didn't find available persistent volumes to bind. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

PV

[root@dev-awx-01 k8awx]# kubectl get pv
NAME             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS    REASON   AGE
static-data-pv   11Gi       RWX            Retain           Available           local-storage            11m
[root@dev-awx-01 k8awx]# kubectl describe pv static-data-pv
Name:              static-data-pv
Labels:            <none>
Annotations:       <none>
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      local-storage
Status:            Available
Claim:
Reclaim Policy:    Retain
Access Modes:      RWX
VolumeMode:        Filesystem
Capacity:          11Gi
Node Affinity:
  Required Terms:
    Term 0:        kubernetes.io/hostname in [dev-awx-]
Message:
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /data/awx
    HostPathType:
Events:            <none>

PVC

[root@dev-awx-01 k8awx]# kubectl get pvc
NAME                              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS    AGE
postgres-13-*****-postgres-13-0   Pending                                      local-storage   12m
static-data-pvc                   Pending                                      local-storage   12m
[root@dev-awx-01 k8awx]# kubectl describe pvc postgres-13-****-postgres-13-0
Name:          postgres-13-****-postgres-13-0
Namespace:     awx
StorageClass:  local-storage
Status:        Pending
Volume:
Labels:        app.kubernetes.io/component=database
               app.kubernetes.io/instance=postgres-13-****
               app.kubernetes.io/managed-by=awx-operator
               app.kubernetes.io/name=postgres-13
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       ****-postgres-13-0
Events:
  Type    Reason                Age                   From                         Message
  ----    ------                ----                  ----                         -------
  Normal  WaitForFirstConsumer  12m                   persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  WaitForPodScheduled   2m26s (x41 over 12m)  persistentvolume-controller  waiting for pod ****-postgres-13-0 to be scheduled

I would appreciate any feedback on this topic.

ioluc commented 2 years ago

After apply chmod for pv, pods get up on fresh install Postgres persistent volume ex: sudo chmod 755 /psql/postgres-13 AWX persistent projects volume ex: sudo chown 1000:0 /awx/projects

Don't try to use your old pv initialized by PostgreSQL 12 without upgrade, this will fail.

kurokobo commented 2 years ago

@aimcod AWX Operator creates PVC for PSQL, but does not create PV for the PVC. Seems the PVC has been created by AWX Operator, but there is no usable PVs for your PVC on your K8s cluster (static-data-pv is there but it has RWX access mode). Try creating new PV manually with local-storage class with RWO access mode.

nscblauensteiner commented 2 years ago

@odgon @rooftopcellist

I have now figured it out. The ansible-playbook creates a wrong claim during the deploy (error: "cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"), which has to be deleted during the deploy and replaced with an own one. This PVC must of course match with name (storageClassName) and storage size, also, at least in my case, the app labels must be passed along.

If necessary, I can DM you the yaml files and instructions.

rooftopcellist commented 2 years ago

I ran into this after bulk deleting a bunch of resources in my namespace (included "kustomize build . | kubectl delete -f -" and "kubectl delete pvc --all"), then quickly after deploying the awx-operator using the Kustomization instructions in the Basic Install section of the README.md.

However, after deleting the PVC's and waiting a couple minutes, then trying to deploy the awx-operator and create an AWX instance again, it worked.

I suspect that this is some caching issue with with k8s' etcd. If you see this again, I suggest trying it in a different namespace, or terminating and recreating your namespace if that is an option.

Thanks, AWX Team

On Mon, Oct 17, 2022 at 10:48 AM Lukas B. @.***> wrote:

@odgon https://github.com/odgon @rooftopcellist https://github.com/rooftopcellist

I have now figured it out. The ansible-playbook creates a wrong claim during the deploy (error: "cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"), which has to be deleted during the deploy and replaced with an own one. This PVC must of course match with name (storageClassName) and storage size, also, at least in my case, the app labels must be passed along.

If necessary, I can DM you the yaml files and instructions.

— Reply to this email directly, view it on GitHub https://github.com/ansible/awx-operator/issues/1022#issuecomment-1280986624, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZIFTC47KFF2B6ZDB72XATWDVRNRANCNFSM562AKBFA . You are receiving this because you were mentioned.Message ID: @.***>

--


CHRISTIAN ADAMS

SOFTWARE ENGINEER, ANSIBLE TOWER

@.*** | (919) 218-5080 | Github: rooftopcellist https://www.redhat.com

bhrbgk commented 2 years ago

@nscblauensteiner Hi! just ran into the same issue. mind sharing you fix? thanks in advance!

nscblauensteiner commented 2 years ago

@nscblauensteiner Hi! just ran into the same issue. mind sharing you fix? thanks in advance!

Hy, sure.

My PV and PVC contain corporate data. If you leave me your email, I will get back to you via PM.

bhrbgk commented 2 years ago

@nscblauensteiner thank you a lot. contact me at b@bnmcn.io. have a good day!

nscblauensteiner commented 2 years ago

@nscblauensteiner thank you a lot. contact me at b@bnmcn.io. have a good day!

Got error message back: The DNS has reported that the domain of the recipient does not exist

bhrbgk commented 2 years ago

@nscblauensteiner sorry. i should've had a second coffee before typing. it's b@bncmn.io ;)

vg-mc commented 2 years ago

I have the same issue. After a while the installation just stalls to

{"level":"info","ts":1667213815.0013413,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}

Env is k8s with openebs storage (postgres container seems happy with it) Trying with v0.28.0 as latest has issue with init container permissions.

e: might not be the same issue, my issue might be that I had IPv6 disabled in my k8s

acas25 commented 2 years ago

@nscblauensteiner I had the same issue, can you share what you did with the yaml files.

nscblauensteiner commented 2 years ago

@nscblauensteiner I had the same issue, can you share what you did with the yaml files.

Hy, same rule for you :) - Leave me your mail address here

MatthieuLeMee commented 2 years ago

Why is this issue closed but it doesn't seem to have any solution available ?

vg-mc commented 2 years ago

Why is this issue closed but it doesn't seem to have any solution available ?

Try disabling IPv6 on your nodes or go back to v0.24.0 if that works. It did for me.

E: you can try checking the container awx-web if it hangs with nginx trying to get IPv6 address to verify this. You do this by k logs deployment/awx -c awx-web -n yournamespace

bchutro commented 2 years ago

@nscblauensteiner I'd be interested in how you fixed this as well. My email is bchutro@pamperedchef.com

MatthieuLeMee commented 2 years ago

I had this error when trying to deploy awx 21.5.0 with and recent 1.0.0 awx-operator. I think it's related to awx-ee:latest latest image having problems with old awx:21.5.0 image. Upgrading to 21.8.0 fixed it.

You can check init container logs, it gives some useful information sometimes : kubectl logs awx-767b7d7c7b-72prx init

trippinnik commented 2 years ago

Why are we emailing someone to get the fix? Could someone just post what they did to fix this?

iuvooneill commented 1 year ago

I'm seeing this issue as well, with 1.1.3. Completely new EKS cluster, everything fresh - but it seems no PV is getting created. Seems like awx-operator itself needs a fix?

mkeology commented 1 year ago

I'm under to the same spell, doesn't work, with a completely new miniKube setup with awx-operator 1.1.3:

{"level":"info","ts":1673269573.8766282,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"} {"level":"info","ts":1673269579.516033,"logger":"proxy","msg":"cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist"}

sveerabathini commented 1 year ago

Hi team,

I am trying to use AWS EKS Fargate to deploy the Awx tower, with External RDS PostgresSql server less.

i am stuck at the awx-postgres pod in pending status with error below error.

Pod not supported on Fargate: volumes not supported: postgres-13 not supported because: PVC postgres-13-awx-postgres-13-0 not bound

i have few questions here: Do we need Persistent Volume for sure, if yes how can i make use of efs in AWS.

if some one has already done it can you help me in setting up the environment.

ghost commented 1 year ago

@nscblauensteiner Hi! just ran into the same issue. mind sharing you fix? thanks in advance!

Hy, sure.

My PV and PVC contain corporate data. If you leave me your email, I will get back to you via PM.

Hi would you be able to send me the pv and pvc config? Thanks! @nscblauensteiner

nscblauensteiner commented 1 year ago

@nscblauensteiner Hi! just ran into the same issue. mind sharing you fix? thanks in advance!

Hy, sure. My PV and PVC contain corporate data. If you leave me your email, I will get back to you via PM.

Hi would you be able to send me the pv and pvc config? Thanks! @nscblauensteiner

Hy, leave me your email address here.

ghost commented 1 year ago

@nscblauensteiner Hi! just ran into the same issue. mind sharing you fix? thanks in advance!

Hy, sure. My PV and PVC contain corporate data. If you leave me your email, I will get back to you via PM.

Hi would you be able to send me the pv and pvc config? Thanks! @nscblauensteiner

Hy, leave me your email address here.

Hi my email is wenbodu3@gmail.com! Thanks a lot.

tlouvart commented 1 year ago

Hi @nscblauensteiner , could you please send me the pv and pvc config too ? at th.louvart29@gmail.com Thanks a lot

ghost commented 1 year ago

@nscblauensteiner Hi! just ran into the same issue. mind sharing you fix? thanks in advance!

Hy, sure. My PV and PVC contain corporate data. If you leave me your email, I will get back to you via PM.

Hi would you be able to send me the pv and pvc config? Thanks! @nscblauensteiner

Hy, leave me your email address here.

Hi @nscblauensteiner thanks for your email it helped a lot. However, do you think it is possible to get is working with EKS Fargate? In EKS fargate we cannot mkdir on node, are there any work around?

doommot01 commented 1 year ago

hello @ @nscblauensteiner ... Please, can you send the pv and PVC config? josorio@outlook.com

mednbiba commented 1 year ago

hey @nscblauensteiner can you send me the configs to mohamednbiba@gmail.com?

fouram commented 1 year ago

hey @nscblauensteiner could you just actually post the fix here instead of gathering email addresses so that we don't have to keep bothering you?

ericeguzman commented 1 year ago

Hi @nscblauensteiner Are you able to send me the fix? ericguzman49@gmail.com. Thanks

littleyoda83 commented 1 year ago

What is the fix here? I am running 1.2.0

apiening commented 1 year ago

Dear @nscblauensteiner can you please share your fix? I have the same issue.

nscblauensteiner commented 1 year ago

Hy everyone - I made it this way:

1) Adjust awx_pv_13.yaml to your setup (basically you need to edit “storage”, “path”) a. The path must be created first “mkdir …” 2) Adjust awx_pvc_13.yaml to your setup (“storage” needs to fit the one in persistent volume) 3) Apply the new persistent volume (PostgresSQL 13) - kubectl apply -f awx_pv_13.yaml 4) Then you have to fetch new Github tags and checkout – git fetch --all –tags && git checkout 0.26.0 a. I would recommend an intermediate step to operator 0.26.0, then you can update to the latest one. 5) make deploy 6) Then you check “kubectl logs -f deployments/awx-operator-controller-manager -c awx-manager” when the error occurs (Kind=PordList err-Index …), then you have to: 7) kubectl delete pvc postgres-13-awx-postgres-13-0 8) kubectl apply -f awx_pvc_13.yaml 9) You can now check the log again, after some time, the upgrade needs to continue, and the error disappears.

Br, Lukas pv_pvc.zip

vg-mc commented 1 year ago

The issue for me was the IPv6 support wasn't enabled on my Kubernetes cluster which was fixed in version 1.1.1 with the addition of configurable entry that disables it. Worth a try!

apiening commented 1 year ago

Hi @vg-mc, I don't have IPv6 enabled on my K3s cluster as well. So you suggest to us the awx-operator 1.1.1 and disable the ipv6 listener with the newly introduced flag, right? Can you please tell how I can set this flag? Do I have to edit a config file and do a make deploy then? If so: Which one?

apiening commented 1 year ago

Hi @vg-mc, I gave it a try and checked out 1.1.1 and set ipv6_disabled: true in roles/installer/defaults/main.yml but this did not change anything for me: The operator was not able to finish the deployment.

apiening commented 1 year ago

Hi @nscblauensteiner,

thank you very much, I tried to follow your steps.

I got to the point where the log outputs several lines with Kind=PodList err-Index with name field:status.phase does not exist. Then I did a kubectl delete pvc postgres-13-awx-demo-postgres-13-0 -n awx and it was confirmed by persistentvolumeclaim "postgres-13-awx-demo-postgres-13-0" deleted, however, the command hangs and does not finish. I can exit out with Ctrl + c but when I to a kubectl apply -f ../pv-fix/awx_pvc_13.yaml -n awx I get the message Warning: Detected changes to resource postgres-13-awx-demo-postgres-13-0 which is currently being deleted..

Any idea why this is or how I can get around this?

apiening commented 1 year ago

I did found out how the pvc could be removed:

kubectl patch pvc postgres-13-awx-demo-postgres-13-0 -p '{"metadata":{"finalizers":null}}' -n awx

Unfortunately, after the removal of the PVC and creation of the new one, the deployment restarts but then stucks at the same point.

2and3makes23 commented 4 months ago

I came across this issue after seeing the very same error messages from awx-operator/awx-manager: cache miss: /v1, Kind=PodList err-Index with name field:status.phase does not exist

For me the solution was to fix my requests/limits that were to high for the present limit range.

So I´m guessing there are lots of possible triggers for this error message. Just in case it helps someone. ✌️ :)

valkiriaaquatica commented 3 months ago

Hey, I tried to simulate the problem with some of the versions and date above and I found that problem happens if it is run on minikube. If the operator is deployed in k3, kind or cluster there is no that problem.

At least the error "Kind=PodList err-Index with name field:status.phase does not exist was related to the postgres pot that describing it showed: Error: stat /data/postgres-15 .

To solve that (just in minikube) do this: Get the id of the process executing the postgres container, as it was not running we had to get id from describing it:

kubectl describe pod awx-postgres-15-0 -n awx | grep "/var/lib/pgsql/data"
      chown 26:0 /var/lib/pgsql/data
      chmod 700 /var/lib/pgsql/data

In my case was the 26

minikube ssh

Inside minikube

sudo mkdir -p /data/postgres-15
sudo chown -R 26:0 /data/postgres-15
sudo chmod -R 700 /data/postgres-15

Then delete the postgres pod to force the "restart" kubectl delete pod awx-postgres-15-0 -n awx

And then check if the error doest not appear on controller kubectl -n awx logs -f deployments/awx-operator-controller-manager

Edit Testing on K3s on two nodes cluster also had to to applied the below