Open D1StrX opened 1 year ago
I'm encountering the same error on an EKS cluster.
AWX Operator Version: 2.5.2 AWX Version 23.0.0 Kubernetes Version 1.26
{
"msg": "The task includes an option with an undefined variable. The error was: 'ansible_operator_meta' is undefined. 'ansible_operator_meta' is undefined\n\nThe error appears to be in '/runner/project/playbooks/roles/backup/tasks/creation.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Patching labels to {{ kind }} kind\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n",
"_ansible_no_log": false
}
@AlanCoding This task is a bigger issue. The latest change for the partition table... Broke the partition table (I believe creation).
Yeah, it could be the bug that https://github.com/ansible/awx/pull/14572 is trying to fix.
The introduction of the bug https://github.com/ansible/awx/commit/f5922f76fa852fde2336fcd69c6db630ff8e72b7 made it into the last release.
i have the same issue.
AWX Operator Version: 2.8.0 AWX Version 23.5.0 Kubernetes v1.27.7
Fatal: [localhost] FAILED!
Message: The task includes an option with an undefined variable. The error was: list object has no element 0. list object has no element 0.
The error appears to be in '/opt/ansible/roles/backup/tasks/postgres.yml':
Line: 3, Column: 3, but may be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Get PostgreSQL configuration
^ here
I've just run into this issue myself, and I'm not clear why the issues referenced in @AlanCoding 's post are relevant to it? The playbook / role are complaining about an undefined variable which seems to me to be independent of the underlying problems with postgres partitions. As far as I can see - the ansible_operator_meta variable should be supplied from the operator_sdk.util collection - but this doesn't appear to be case (at least on my EKS cluster).
Error I'm seeing:
{
"msg": "The task includes an option with an undefined variable. The error was: 'ansible_operator_meta' is undefined. 'ansible_operator_meta' is undefined\n\nThe error appears to be in '/runner/requirements_roles/srg_awx_backup/tasks/creation.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Patching labels to {{ kind }} kind\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n",
"_ansible_no_log": false
}
I am happy to share that the backup with AWXBackup works for me. Not sure when a patch has been released or what else, but the backup part works. Haven't tried a restore yet.
I've just run into this issue myself, and I'm not clear why the issues referenced in @AlanCoding 's post are relevant to it? The playbook / role are complaining about an undefined variable which seems to me to be independent of the underlying problems with postgres partitions. As far as I can see - the ansible_operator_meta variable should be supplied from the operator_sdk.util collection - but this doesn't appear to be case (at least on my EKS cluster).
Error I'm seeing:
{ "msg": "The task includes an option with an undefined variable. The error was: 'ansible_operator_meta' is undefined. 'ansible_operator_meta' is undefined\n\nThe error appears to be in '/runner/requirements_roles/srg_awx_backup/tasks/creation.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Patching labels to {{ kind }} kind\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n", "_ansible_no_log": false }
Perhaps a good time to try again? @godeater
And I have to mention that backups don't work anymore... just like https://github.com/ansible/awx-operator/issues/879#issuecomment-2166487493 I see the same behavior as @vivekshete9 describes, endless re-spinning up. I use the most basic example as described in the docs.
---
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: <name>
namespace: <namespace>
spec:
deployment_name: <deploymentname>
backup_storage_class: "<storageclass>"
backup_storage_requirements: "1Gi"
backup_pvc_namespace: "<namespace>"
image_pull_policy: "IfNotPresent"
clean_backup_on_delete: false
no_log: true
With a container connected to the backup pvc, after a backup is "created", I see a lot of tower-openshift-backup-xxxxx
with tower.db inside of the folder. No clue if this is valid data.
Even with no_log: false
I see zero output, nor in the awxbackups object;
kubectl describe awxbackups.awx.ansible.com
...
Status:
Conditions:
Last Transition Time: 2024-06-19T20:22:43Z
Reason: Failed
Status: True
Type: Failure
Last Transition Time: 2024-06-19T20:22:43Z
Reason:
Status: False
Type: Successful
Last Transition Time: 2024-06-19T20:23:35Z
Reason: Running
Status: True
Type: Running
Events: <none>
And I have to mention that backups don't work anymore... just like #879 (comment) I see the same behavior as @vivekshete9 describes, endless re-spinning up. I use the most basic example as described in the docs.
Have you had a look at https://github.com/ansible/awx-operator/issues/1908 ?
It seems the backup object itself doesn't log anything - but I found that as soon as you apply the manifest for the backup to your cluster, the logs from the awx-operator do show what's going on - and in my case at least (as above), it's because pg_dump is segfaulting.
I see, also seeing the same errors in my operator as described in #1908 .
I see, also seeing the same errors in my operator as described in #1908 .
I fixed the pg_dump seg faults by changing the postgres image used to the official one, rather than the sclorg one (see #1908 again). Does that help your case?
The use of the custom issue indeed results in a successful backup. But I see this rather as a workaround then a fix. Also the documentation lacks awxbackup and it's possible options? https://ansible.readthedocs.io/projects/awx/en/latest/search.html?q=backup&check_keywords=yes&area=default
I agree it's a workaround, and that the docs could be better (I had to dig into the source to find that you could override the image with those options.).
Unfortunately it doesn't seem like anyone from the project is paying attention to this issue (and fair enough, it's open source, not paid work, they can choose to do what they like) - so we are where we are. ¯_(ツ)_/¯
Ran into this today with Operator 2.16.1. Same "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'backupClaim'. 'dict object' has no attribute 'backupClaim'"
So it seems like backup and restore is just fully broken for AWX? Has anyone found a good fallback plan for resilience? Make a backup of your DB and hope that you can piece it all together in the event of your data being lost?
EDIT: I'm seeing "workaround" mentioned in the comment above this one, and also in #1902 but this one also refers to #1908 and that one was closed as a dupe of #1895 . Could someone who understands the issue a little better post a concise summary of the workaround steps in one place?
I have come to the conclusion that the AWXBackup
CRD and backend code is not written to leverage Kubernetes native capabilities. That's from an admin perspective. IMHO AWXBackup
should create a Kubernetes cronjob that can be scheduled to your preference. Instead, you currently have to create the resource object each time. And creating this object with AWX/Tower requires Kubernetes credentials, not the easiest thing to setup with AWX/Tower or best approach.
It also lacks the option to directly write the backup to another storage backend, like S3 (e.g. AWS or MinIO). Currently I have created these resources to automate the backup process, all from within a K8s cluster. If you want to use this, adjust and verify these values according to your preferences:
All resources
Cronjob create awxbackup
Postgres image
https://github.com/ansible/awx-operator/issues/1908 / https://github.com/ansible/awx-operator/issues/1895 @bryan-srg Cronjob S3 upload
NOTE: This uploads all directories in the backup path and keeps the most recent one until the next run, at which point the new most recent directory will be kept.
/backupdata
https://subdomain.domain.tld:api_port
Cronjob cleanup awxbackup
date_limit
- > '7 days ago'
How long you want to keep the AWXBackup
object. Nothing to do with the backup directory.Check backups
/backupdata
Secret
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: awx-backup-role
namespace: awx # Replace with your namespace
rules:
- apiGroups:
- awx.ansible.com
resources:
- awxbackups
verbs:
- get
- create
- list
- watch
- delete
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: awx-backup-rolebinding
namespace: awx # Replace with your namespace
subjects:
- kind: ServiceAccount
name: awx-backup-sa
namespace: awx # Replace with your namespace
roleRef:
kind: Role
name: awx-backup-role
apiGroup: rbac.authorization.k8s.io
apiVersion: v1
kind: ServiceAccount
metadata:
name: awx-backup-sa
namespace: awx # Replace with your namespace
apiVersion: batch/v1
kind: CronJob
metadata:
name: create-awxbackup
namespace: awx # Replace with your namespace
spec:
schedule: "45 2 * * *" # Runs daily at 2:45 AM (UTC)
jobTemplate:
spec:
template:
spec:
serviceAccountName: awx-backup-sa
containers:
- name: create-awx-backup
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
cat <<EOF | kubectl apply -f -
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: awxbackup-$(date +'%Y-%m-%d-%H-%M-%S')
namespace: awx
spec:
deployment_name: awx
backup_storage_class: "<storageclass>"
_postgres_image: docker.io/postgres
_postgres_image_version: 15-alpine
backup_storage_requirements: "1Gi"
backup_pvc_namespace: "awx"
image_pull_policy: "IfNotPresent"
clean_backup_on_delete: false # Leave false, only deletes pvc when resource AWXBackup is deleted
no_log: true
EOF
restartPolicy: OnFailure
apiVersion: batch/v1
kind: CronJob
metadata:
name: s3-upload-awxbackup
namespace: awx # Replace with your namespace
spec:
schedule: "0 3 * * *" # Runs daily at 3:00 AM (UTC)
jobTemplate:
spec:
template:
spec:
containers:
- name: backup-container
image: amazon/aws-cli
envFrom:
- secretRef:
name: s3-credentials-awx-backup
command:
- /bin/bash
- -c
- |
aws s3 cp /backupdata s3://<bucket>>/ --recursive --endpoint-url https://subdomain.domain.tld:api_port
# Define the directory
DIR="/backupdata"
# Find the latest directory and store its name
LATEST_DIR=$(ls -td ${DIR}/*/ | head -n 1)
# Remove trailing slash from directory name
LATEST_DIR=${LATEST_DIR%/}
echo "Latest dir:" $LATEST_DIR
# Delete all directories except the latest one
find ${DIR} -maxdepth 1 -type d ! -path "${LATEST_DIR}" ! -path "${DIR}" -exec rm -rf {} +
volumeMounts:
- name: data-volume
mountPath: /backupdata
restartPolicy: OnFailure
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: awx-backup-claim
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-awxbackup
namespace: awx # Replace with your namespace
spec:
schedule: "30 3 * * *" # Runs daily at 3:30 AM (UTC)
jobTemplate:
spec:
template:
spec:
serviceAccountName: awx-backup-sa
containers:
- name: cleanup-backups
image: bitnami/kubectl:latest # A lightweight image with kubectl installed
command:
- /bin/bash
- -c
- |
namespace="awx" # Replace with your namespace
date_limit=$(date -d '7 days ago' --utc +'%Y-%m-%dT%H:%M:%SZ')
echo "Date limit: $date_limit"
# List all AWXBackup resources and their creation timestamps
backups=$(kubectl get awxbackups -n "$namespace" -o jsonpath='{.items[*].metadata.name}')
echo "All backups:" $backups
# Loop through backups and delete those older than the date limit
for backup in $backups; do
# Get the creation timestamp of the backup
creation_time=$(kubectl get awxbackup "$backup" -n "$namespace" -o jsonpath='{.metadata.creationTimestamp}')
# Compare the creation timestamp with the date limit
if [[ "$creation_time" < "$date_limit" ]]; then
kubectl delete awxbackup "$backup" -n "$namespace"
echo "Deleted backup: $backup"
fi
done
restartPolicy: OnFailure
apiVersion: v1
kind: Secret
metadata:
name: s3-credentials-awx-backup
namespace: awx # Replace with your namespace
type: Opaque
data:
AWS_ACCESS_KEY_ID: <base64 access_key>
AWS_SECRET_ACCESS_KEY: <base64 secret_access_key>
AWS_DEFAULT_REGION: <base64 region>
AWS_ENDPOINT_URL: <base64 https://subdomain.domain.tld:api_port>
apiVersion: v1
kind: Pod
metadata:
name: busybox-pod
namespace: awx # Replace with your namespace
spec:
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: awx-backup-claim
containers:
- name: busybox-container
image: busybox
command: ["sleep", "3600"]
volumeMounts:
- name: data-volume
mountPath: /backupdata
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 100m
memory: 100Mi
kubectl exec -it busybox-pod -n awx -- sh
cd /backupdata
ls
I have come to the conclusion that the
AWXBackup
CRD and backend code is not written to leverage Kubernetes native capabilities.
Wow, thanks for putting all that together. I was coming to a similar conclusion myself; I've ended up writing a bunch of bash/curl commands to export most of what I need via the API and then documented some of the other details in an internal company wiki. I agree with your key point that it seems like a proper AWX backup needs to be easily moved to a different location, outside the cluster. That, plus struggles to get the backup and restore roles working fully, was one of my key reasons for putting in the work to script out the export/import via scripts.
Another thing possible is to write every AWX component in Ansible playbooks. I've done that too, so everything is documented and stateful. A native Ansible approach compared to bash scripts. https://docs.ansible.com/ansible/latest/collections/awx/awx/index.html
Please confirm the following
Bug Summary
On both k8s and k3s, embedded and external Postgresql DB the AWX backup fails with the exact same error:
AWX Operator version
2.4.0 - 2.6.0
AWX version
AWX 22.5.0 - 23.2.0
Kubernetes platform
kubernetes
Kubernetes/Platform version
1.27.4 k8s/k3s
Modifications
no
Steps to reproduce
Expected results
Succesful backup
Actual results
The error was: list object has no element 0.
When performing only the 2 k8sclusterinfo tasks locally in ansible., one for setting the fact this__awx and the other one for pg_config, it works fine.
When running this playbook inside the awx operator container, it says
ansible_operator_meta is undefined
.Additional information
Operator Logs