ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.24k stars 626 forks source link

awxbackups fail with => The task includes an option with an undefined variable #1902

Open salanisor opened 3 months ago

salanisor commented 3 months ago

Please confirm the following

Bug Summary

Backups were working fine and suddenly started receiving the following error, similar to issue 1577.

Started with v2.17.0 & upgraded to see if the issue would go away.

awx-operator.v2.18.0                AWX                              2.18.0    awx-operator.v2.17.0

AWX Operator version

2.18.0

AWX version

v1beta1

Kubernetes platform

openshift

Kubernetes/Platform version

4.13.17

Modifications

no

Steps to reproduce

apply yaml

---
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
  name: awxbackup-test-sandbox-2
  namespace: awx
spec:
  deployment_name: awx-infra
  backup_pvc: awx-sandbox-backup-claim
  no_log: false

Expected results

successful backups - can actually see that the task returns the right field when testing. Should have probably used the same ansible version.

cat awx.json | jq '.this_awx.resources[0].status.postgresConfigurationSecret'
"awx-sandbox-postgres-configuration"
ansible-playbook --version
ansible-playbook [core 2.17.0]
  config file = None
  configured module search path = ['/Users/x/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /opt/homebrew/Cellar/ansible/10.0.1/libexec/lib/python3.12/site-packages/ansible
  ansible collection location = /Users/x/.ansible/collections:/usr/share/ansible/collections
  executable location = /opt/homebrew/bin/ansible-playbook
  python version = 3.12.3 (main, Apr  9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] (/opt/homebrew/Cellar/ansible/10.0.1/libexec/bin/python)
  jinja version = 3.1.4
  libyaml = True

Actual results

fails with error

TASK [backup :Get PostgreSQL configuration] ***********************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/backup/tasks/postgres.yml:3\u001b[0m\n\u001b[0;31mfatal: [localhost]: FAILED!

 => {\"msg\": \"The task includes an option with an undefined variable. The error was: list object has no element 0. list object has no element 0\\n\\nThe error appears to be in '/opt/ansible/roles/backup/tasks/postgres.yml'
 : line 3, column 3, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n\\n- name: Get PostgreSQL configuration\\n  ^ here\\n\"}\u001b[0m\n\r\nPLAY RECAP 
 *********************************************************************\r\n\u001b[0;31mlocalhost\u001b[0m                  : \u001b[0;32mok=16  \u001b[0m \u001b[0;33mchanged=2   \u001b[0m unreachable=0    \u001b[0;31mfailed=1 
 \u001b[0m \u001b[0;36mskipped=8   \u001b[0m rescued=0    ignored=0   \n","job":"8280544932261614561","name":"awxbackup-test-sandbox-2","namespace":"awx","error":"exit status 2","stacktrace":"github.com/operator-framework
 /ansible-operator-plugins/internal/ansible/runner.(*runner).Run.func1\n\tansible-operator-plugins/internal/ansible/runner/runner.go:269"}

Additional information

awx definition, this was working fine and had run a few recovery tests in this namespace.

We had installed OCP service mesh and thought it may be the culprit but no. Removed it and issue persists.

 oc get awxbackups
NAME                   AGE
awxbackup-2024         43d
awxbackup-2024-06-03   14d
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  annotations:
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
    argocd.argoproj.io/sync-wave: "200"
  name: awx-sandbox
  namespace: awx
spec:
  postgres_keepalives_count: 5
  postgres_keepalives_idle: 5
  ee_resource_requirements:
    requests:
      cpu: 50m
      memory: 64M
  create_preload_data: true
  garbage_collect_secrets: false
  loadbalancer_port: 80
  no_log: true
  task_resource_requirements:
    requests:
      cpu: 50m
      memory: 128M
  image_pull_policy: IfNotPresent
  loadbalancer_ip: ''
  projects_storage_size: 8Gi
  auto_upgrade: true
  task_privileged: false
  postgres_keepalives: true
  postgres_keepalives_interval: 5
  ipv6_disabled: false
  projects_storage_access_mode: ReadWriteMany
  set_self_labels: true
  web_resource_requirements:
    requests:
      cpu: 50m
      memory: 128M
  projects_persistence: false
  replicas: 1
  admin_user: admin
  loadbalancer_protocol: http

  # Set the default admin password secret 
  admin_password_secret: awx-admin-secret

  # Give route information for this instance 
  service_type: ClusterIP
  ingress_type: route
  route_host: awx-sandbox.apps.help.example.com
  route_tls_termination_mechanism: Edge
  hostname: awx-sandbox.apps.help.example.com

  # Configure this instance to trust OpenLDAP for access checks and for general lookups 
  ldap_cacert_secret: awx-ca-secret
  bundle_cacert_secret: awx-ca-secret

  # Configure this instance to successfully perform Kerberos lookups in domain
  # First, create the ConfigMap as part of the kustomization.yaml 
  # Second, turn the ConfigMap into a volume 
  # Third, mount the volume to the Pod Members 
  extra_volumes: |
    - name: awx-corp-krb5
      configMap:
        defaultMode: 420
        items:
        - key: krb5.conf
          path: krb5.conf
        name: awx-corp-krb5-conf-configmap
  web_extra_volume_mounts: |
    - name: awx-corp-krb5
      mountPath: /etc/krb5.conf
      subPath: krb5.conf
  task_extra_volume_mounts: |
    - name: awx-corp-krb5
      mountPath: /etc/krb5.conf
      subPath: krb5.conf
  ee_extra_volume_mounts: |
    - name: awx-corp-krb5
      mountPath: /etc/krb5.conf
      subPath: krb5.conf

Operator Logs

log.tar.gz

D1StrX commented 3 months ago

Yup, same as in here: https://github.com/ansible/awx-operator/issues/1518

djyasin commented 3 months ago

Hello @salanisor, is this the same issue described in #1518?

fritz0011 commented 2 months ago

Same issue here: Rancher +k8s 1.28.9 + awx-operator awx-operator-2.19.0/1

--------------------------- Ansible Task StdOut -------------------------------

TASK [Get PostgreSQL configuration] **** fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0. list object has no element 0\n\nThe error appears to be in '/opt/ansible/roles/backup/tasks/postgres.yml': line 3, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Get PostgreSQL configuration\n ^ here\n"}

fritz0011 commented 2 months ago

root@mgmtrke2m01:~# kubectl get awx -o yaml -n awx-prod apiVersion: v1 items:

root@mgmtrke2m01:~# kubectl get secret awx-db-descret -n awx-prod NAME TYPE DATA AGE awx-db-descret Opaque 6 14d

... could be related to the fact that the secret is using base64 enc strings instead of cleartext ?

djyasin commented 2 months ago

There is a workaround described here https://github.com/ansible/awx-operator/issues/1518.

We are still investigating this issue and may have more information soon!

fritz0011 commented 2 months ago

@djyasin , I just did a bit of troubleshooting according to : https://github.com/ansible/awx-operator/blob/devel/roles/backup/tasks/postgres.yml => this may trigger the error " name: "{{ this_awx['resources'][0]['status']['postgresConfigurationSecret'] }}"

that traced to this =>

fritz0011 commented 2 months ago

as of today 25:07

TASK [Create new AWXBackup resource and wait for complete] ***** changed: [localhost] => {"changed": true, "duration": 50, "method": "create", "result": {"apiVersion": "awx.ansible.com/v1beta1", "kind": "AWXBackup", "metadata": {"creationTimestamp": "2024-07-25T08:30:58Z", "finalizers": ["awx.ansible.com/finalizer"], "generation": 1, "labels": {"app.kubernetes.io/component": "awx", "app.kubernetes.io/managed-by": "awx-operator", "app.kubernetes.io/operator-version": "2.19.1", "app.kubernetes.io/part-of": "awxbackup-2024-07-25-08-30-57"}, "managedFields": [{"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:metadata": {"f:finalizers": {".": {}, "v:\"awx.ansible.com/finalizer\"": {}}}}, "manager": "ansible-operator", "operation": "Update", "time": "2024-07-25T08:30:58Z"}, {"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:metadata": {"f:labels": {".": {}, "f:app.kubernetes.io/component": {}, "f:app.kubernetes.io/managed-by": {}, "f:app.kubernetes.io/operator-version": {}, "f:app.kubernetes.io/part-of": {}}}, "f:spec": {".": {}, "f:backup_pvc": {}, "f:clean_backup_on_delete": {}, "f:deployment_name": {}, "f:image_pull_policy": {}, "f:no_log": {}, "f:postgres_image": {}, "f:postgres_image_version": {}, "f:set_self_labels": {}}}, "manager": "OpenAPI-Generator", "operation": "Update", "time": "2024-07-25T08:31:00Z"}, {"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:status": {"f:backupClaim": {}, "f:backupDirectory": {}}}, "manager": "OpenAPI-Generator", "operation": "Update", "subresource": "status", "time": "2024-07-25T08:31:42Z"}, {"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:status": {".": {}, "f:conditions": {}}}, "manager": "ansible-operator", "operation": "Update", "subresource": "status", "time": "2024-07-25T08:31:46Z"}], "name": "awxbackup-2024-07-25-08-30-57", "namespace": "awx-prod", "resourceVersion": "44044680", "uid": "c98de577-a319-43c3-b734-43d6bf6d1f8f"}, "spec": {"backup_pvc": "backupawx", "clean_backup_on_delete": true, "deployment_name": "awx", "image_pull_policy": "IfNotPresent", "no_log": false, "postgres_image": "postgres", "postgres_image_version": "14", "set_self_labels": true}, "status": {"backupClaim": "backupawx", "backupDirectory": "/backups/tower-openshift-backup-2024-07-25-083121", "conditions": [{"lastTransitionTime": "2024-07-25T08:31:42Z", "reason": "", "status": "False", "type": "Failure"}, {"lastTransitionTime": "2024-07-25T08:30:58Z", "reason": "Successful", "status": "True", "type": "Running"}, {"lastTransitionTime": "2024-07-25T08:31:46Z", "reason": "Successful", "status": "True", "type": "Successful"}]}}}

backup creation: Successful

bigtree21cn commented 2 months ago

as of today 25:07

TASK [Create new AWXBackup resource and wait for complete] ***** changed: [localhost] => {"changed": true, "duration": 50, "method": "create", "result": {"apiVersion": "awx.ansible.com/v1beta1", "kind": "AWXBackup", "metadata": {"creationTimestamp": "2024-07-25T08:30:58Z", "finalizers": ["awx.ansible.com/finalizer"], "generation": 1, "labels": {"app.kubernetes.io/component": "awx", "app.kubernetes.io/managed-by": "awx-operator", "app.kubernetes.io/operator-version": "2.19.1", "app.kubernetes.io/part-of": "awxbackup-2024-07-25-08-30-57"}, "managedFields": [{"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:metadata": {"f:finalizers": {".": {}, "v:\"awx.ansible.com/finalizer\"": {}}}}, "manager": "ansible-operator", "operation": "Update", "time": "2024-07-25T08:30:58Z"}, {"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:metadata": {"f:labels": {".": {}, "f:app.kubernetes.io/component": {}, "f:app.kubernetes.io/managed-by": {}, "f:app.kubernetes.io/operator-version": {}, "f:app.kubernetes.io/part-of": {}}}, "f:spec": {".": {}, "f:backup_pvc": {}, "f:clean_backup_on_delete": {}, "f:deployment_name": {}, "f:image_pull_policy": {}, "f:no_log": {}, "f:postgres_image": {}, "f:postgres_image_version": {}, "f:set_self_labels": {}}}, "manager": "OpenAPI-Generator", "operation": "Update", "time": "2024-07-25T08:31:00Z"}, {"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:status": {"f:backupClaim": {}, "f:backupDirectory": {}}}, "manager": "OpenAPI-Generator", "operation": "Update", "subresource": "status", "time": "2024-07-25T08:31:42Z"}, {"apiVersion": "awx.ansible.com/v1beta1", "fieldsType": "FieldsV1", "fieldsV1": {"f:status": {".": {}, "f:conditions": {}}}, "manager": "ansible-operator", "operation": "Update", "subresource": "status", "time": "2024-07-25T08:31:46Z"}], "name": "awxbackup-2024-07-25-08-30-57", "namespace": "awx-prod", "resourceVersion": "44044680", "uid": "c98de577-a319-43c3-b734-43d6bf6d1f8f"}, "spec": {"backup_pvc": "backupawx", "clean_backup_on_delete": true, "deployment_name": "awx", "image_pull_policy": "IfNotPresent", "no_log": false, "postgres_image": "postgres", "postgres_image_version": "14", "set_self_labels": true}, "status": {"backupClaim": "backupawx", "backupDirectory": "/backups/tower-openshift-backup-2024-07-25-083121", "conditions": [{"lastTransitionTime": "2024-07-25T08:31:42Z", "reason": "", "status": "False", "type": "Failure"}, {"lastTransitionTime": "2024-07-25T08:30:58Z", "reason": "Successful", "status": "True", "type": "Running"}, {"lastTransitionTime": "2024-07-25T08:31:46Z", "reason": "Successful", "status": "True", "type": "Successful"}]}}}

backup creation: Successful

facing the same issue. @fritz0011 Could you share how did you fix this?

fritz0011 commented 2 months ago

@bigtree21cn

awx-operator 2.19.1

using this approach: backup AWX within AWX as jobtemplate https://github.com/kurokobo/awx-on-k3s ++ important: https://github.com/kurokobo/awx-on-k3s/tree/main/containergroup#create-container-group

awx deployed to this NS: awx-prod jobtemplate: extravars

awxbackup_namespace: awx-prod awxbackup_keep_days: 10 awxbackup_spec: deployment_name: awx clean_backup_on_delete: true backup_pvc: backupawx postgres_image: postgres postgres_image_version: '14' no_log: false

salanisor commented 1 month ago

Hello @salanisor, is this the same issue described in #1518?

My bad for the late reply. And I'm not sure it's not the same issue. However, even trying the workaround provided by @fritz0011 & the solution in #1518 still produces the same error on OpenShift 4.13.17 with AWX version 2.19.1