Open rooftopcellist opened 1 year ago
The workaround for this is to delete the awx-1 deployment after taking a backup and it's secrets, then restore from the backup. However, it would be nice to not have to do this.
Another option would be to migrate the backup PVC to another namespace, then do it there, but that is a hassle.
Please confirm the following
Bug Summary
If a Restore is done in the same namespace as the original deployment, the same postgres-configuration secret is used for both. The problem is that we modify the postgres-configuration secret to specify the new host. For example, for deployments
awx-1
andawx-2
respectively, these will be the resolvable hosts based on the Service resources created:Currently, when a restore is done, the secret is modified so that the original
awx-1-postgres-13
value is replaced withhost: awx-2-postgres-13
. This results in the awx-operator's reconciliation loop failing repetitively when reconciling initialawx-1
deployment, specifically when the postgres-configuration-secret's host value is used.The following error can be found in the operator logs:
The current work-around is to delete the original deployment and rely on the backup. This is sufficient for most use cases, however this is still a bug that should be fixed imo.
AWX Operator version
2.5.1
AWX version
22.7.0
Kubernetes platform
openshift
Kubernetes/Platform version
v4.11
Modifications
no
Steps to reproduce
Backup and Restore testing
I created a backup, which ran successfully:
Then after the reconciliation loop stopped for the backup, I created a restore:
reconciliation loop shows errors when reconciling the initial deployment.
Expected results
awx-1 and awx-2 deployments should be able to live together in the same namespace and the restore role should not step on this.
Actual results
errors in reconciliation loop.
Additional information
To fix this, we may be able to create a new postgres secret name by adding a unique hash if a secret by the same name exists in the namespace.
Operator Logs
No response