Closed bryan-srg closed 4 months ago
I wondered what would happen if I tried running the same container image that the backup role uses from a local docker instance connecting to the database, and here's what I found:
# docker container run -e PGPASSWORD 2189 pg_dump --clean --create -h 10.110.51.152 -U awx -d awx -p 3306 -F custom > tower_1.db
pg_dump: error: connection to server at "10.110.51.152", port 3306 failed: FATAL: no PostgreSQL user name specified in startup packet
connection to server at "10.110.51.152", port 3306 failed: FATAL: no PostgreSQL user name specified in startup packet
double free or corruption (out)
So I think there might be something wrong with the sclorg/postgresql image.
Trying the same thing using the official postgres container is successful - pg_dump runs fine and starts dumping the database content out.
I've updated my manifest to pull the official postgres container in now instead of the sclorg one:
---
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: awx-backup-2024-06-26
namespace: awx
spec:
deployment_name: awx
no_log: false
_postgres_image: docker.io/postgres
_postgres_image_version: 15-alpine
This has progressed the AWX backup a bit. The 'tower.db' file and 'awx_object' files are written to the PV. Unfortunately, I'm now hitting yet another "The task includes an option with an undefined variable" error - this time in dump_secret.yml.
Relevant bit of log follows:
TASK [backup : Dump secret names from awx spec and data into file] *************
task path: /opt/ansible/roles/backup/tasks/secrets.yml:11
included: /opt/ansible/roles/backup/tasks/dump_secret.yml for localhost => (item=route_tls_secret)
included: /opt/ansible/roles/backup/tasks/dump_secret.yml for localhost => (item=ingress_tls_secret)
included: /opt/ansible/roles/backup/tasks/dump_secret.yml for localhost => (item=ldap_cacert_secret)
included: /opt/ansible/roles/backup/tasks/dump_secret.yml for localhost => (item=bundle_cacert_secret)
included: /opt/ansible/roles/backup/tasks/dump_secret.yml for localhost => (item=ee_pull_credentials_secret)
TASK [backup : Get Secret Name] ************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:3
ok: [localhost] => {\"ansible_facts\": {\"_name\": \"\"}, \"changed\": false}
TASK [backup : Get secret] *****************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:9
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Set secret key] *************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:18
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Create and Add secret names and data to dictionary] *************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:24
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Get Secret Name] ************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:3
ok: [localhost] => {\"ansible_facts\": {\"_name\": \"\"}, \"changed\": false}
TASK [backup : Get secret] *****************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:9
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Set secret key] *************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:18
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Create and Add secret names and data to dictionary] *************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:24
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Get Secret Name] ************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:3
ok: [localhost] => {\"ansible_facts\": {\"_name\": \"\"}, \"changed\": false}
TASK [backup : Get secret] *****************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:9
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Set secret key] *************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:18
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Create and Add secret names and data to dictionary] *************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:24
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Get Secret Name] ************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:3
ok: [localhost] => {\"ansible_facts\": {\"_name\": \"\"}, \"changed\": false}
TASK [backup : Get secret] *****************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:9
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Set secret key] *************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:18
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Create and Add secret names and data to dictionary] *************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:24
skipping: [localhost] => {\"changed\": false, \"false_condition\": \"_name != ''\", \"skip_reason\": \"Conditional result was False\"}
TASK [backup : Get Secret Name] ************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:3
ok: [localhost] => {\"ansible_facts\": {\"_name\": \"ee-pull-credentials\"}, \"changed\": false}
TASK [backup : Get secret] *****************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:9
ok: [localhost] => {\"api_found\": true, \"changed\": false, \"resources\": []}
TASK [backup : Set secret key] *************************************************
task path: /opt/ansible/roles/backup/tasks/dump_secret.yml:18
fatal: [localhost]: FAILED! => {\"msg\": \"The task includes an option with an undefined variable. The error was: list object has no element 0. list object has no element 0\
\
The error appears to be in '/opt/ansible/roles/backup/tasks/dump_secret.yml': line 18, column 9, but may\
be elsewhere in the file depending on the exact syntax problem.\
\
The offending line appears to be:\
\
\
- name: Set secret key\
^ here\
\"}
PLAY RECAP *********************************************************************
localhost : ok=68 changed=7 unreachable=0 failed=1 skipped=39 rescued=0 ignored=0
Okay - finally figured it out - the spec file for AWX operator that I deployed originally contained a line:
ee_pull_credentials_secret: ee-pull-credentials
But it seems that secret was never actually created - hence the problem in the playbook not being able to find its content to dump out during the backup process. I've "fixed" this by creating a dummy credential and putting it in that secret in the AWX deployment's namespace - this has allowed the backup to create successfully. It seems that most of AWX's functionality is not affected by this missing secret though, and so in my opinion the backup role could also be made more defensive, and if it can't find it, then it should just not try to back it up, and continue the rest of the backup regardless.
Is that a valid opinion?
Please confirm the following
Bug Summary
AWX backup role is failing to create a fully formed backup on the Persistent Volume.
AWX Operator version
2.17.0
AWX version
24.4.0
Kubernetes platform
other (please specify in additional information)
Kubernetes/Platform version
AWS EKS 1.28
Modifications
yes
Steps to reproduce
Apply the following manifest to our AWX deployment:
Then stream the logs of the awx-operator-controller-manager, and keep an eye out for the failure.
Checking the content of the PV (an AWS EBS volume) I find this at the root of the filesystem:
and each one of the tower-openshift-backup-* directories contains a 0 byte tower.db file:
Expected results
The backups should work properly.
Actual results
pg_dump segfaults when trying to write the database dump to the Persistent Volume.
Additional information
Postgres is external, hosted on AWS RDS Aurora Postgres (v15 compatible).
Operator Logs