ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.24k stars 628 forks source link

Ansible-awx backup/restore documentation #489

Open VasseurLaurent opened 3 years ago

VasseurLaurent commented 3 years ago
ISSUE TYPE
SUMMARY

Hello,

I am currently testing Ansible-awx and it is really great. However, I am struggling a bit about how ansible-awx backup/restore process works with the operator.

I took a look on internet, but except the awx-operator specification file itself awx-operator, I didn't find good documentation about how does it work. Let's imagine this scenario :

I have a k3s single node cluster named A and I run ansible-awx on it. I would to be able to backup this instance and store this backup on the machine (I will then upload it on a remote system). Then I deploy a k3s single node cluster named B and I want to restore my instance A.

The database of the ansible-awx A is not managed by ansible-awx and the instance B will have access to the same database.

Did I miss any documentation about it ? Is my scenario possible ?

Thank you a lot in advance for your answers which will help me more to understand this.

ENVIRONMENT
ADDITIONAL INFORMATION

On my own, I have successfully created a backup with the Kubernetes object AWXBackup , however I don't success to restore it with the Kubernetes object AWXRestore. These are my configuration :

PVC to store the backup :

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-awx-backup
  labels:
    type: local

spec:
  storageClassName: local
  capacity:
    storage: 3Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/ansible-awx/backup"

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backupawx
  namespace: default

spec:
  storageClassName: local
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Backup :

apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
  name: backup-awx
  namespace: default
spec:
  deployment_name: awx
  backup_pvc: backupawx

Restore :

apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
  name: restore-awx
  namespace: default
spec:
  deployment_name: awx
  backup_pvc: backupawx
  backup_pvc_namespace: default
  backup_name: "backup-awx"

PS : If you can tell me precisely which logs I need to check in the awx-operator to check the backup/restore workflow, it would be great.

tomtreffke commented 3 years ago

Hi @VasseurLaurent , since I'm also currently checking out the operator; I noticed there is a significant difference between the doc here in the repo and on Ansible.com

Please note, I've not yet restored an instance successfully myself, but I'm also on the problem ;) did you solve it yet? best of regards.

VasseurLaurent commented 3 years ago

Hello @tomtreffke , I still didn't success neither. I hope some will create a documentation clearer about that so you we can make a try

tomtreffke commented 3 years ago

I noticed, in the Restores CRD, there are some more fields defined, where its not clear to me if they are mandatory. I now treated them like it and came up with the following Restore Definition:

kind: AWXRestore
metadata:
  name: restore-awx
  namespace: default
spec:
  deployment_name: awx-dcp
  backup_pvc: backupawx
  backup_pvc_namespace: default
  backup_name: "21-08-26"
  backup_source: "PVC"
  backup_directory: "/backups/tower-openshift-backup-2021-08-26-13:30:00"

Our Backup names iterate, since we create the AWXBackup with a Job. Hence, the name as date (yy-MM-dd). Nevertheless, still no success. -_-

nathan-march commented 3 years ago

Just in case anyone stumbles across this thread and actually needs to do a restore, I was able to load in a backup manually by restoring the tower.db into the postgres container and then manually insert all the kubernetes secrets from secrets.yml. Restarting the containers got it stuck on waiting for a migration (I also had to do a version upgrade at the same time), at which point hopping into the awx-web container and running "awx-manage migrate" fixed things up.

tomtreffke commented 3 years ago

it seems like the AWX restore is not working as an in-place restore of the existing instance.

As stated in the Restore Role: Note that the deployment_name above is the name of the AWX deployment you intend to create and restore to.

Note: 'Deployment' is not a Deployment Resource in K8s, it rather is another AWX Instance:

image

image

The new deployments will be created according to what you specified in the Operator, in my case: simply ClusterIP services without LB and Ingress. In my case, that means i have to take care of publishment on my own as a post-restore task. ..and you also need to get rid of the old instance ;-)

Commifreak commented 3 years ago

I continue with my findings:

I had an 19.3.0 (Operator 0.13.0). Since this k8s cluster had some issues (some "patch" warnings), I wanted to delete everything and start from scratch.

I copied the backup-role pvc template and put that in a seperate file (I just removed the labels):

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: awx-backup-claim
  namespace: default
  ownerReferences: null
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: '5Gi'

and applied that. Now the minikube/docker volume has a new volume.

Then I created the backup file:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
  name: awxbackup
  namespace: default
spec:
  deployment_name: awx
  backup_pvc: 'awx-backup-claim'

and applied that. That created me the whole backup (in my own PVC, note the backup_pvc spec!) with 1 folder and inside that 3 files. (Copy that to a save location!)

Then I deleted everything with minikube delete and docker system prune -a. Then I started off with a new minikube instance and cloned the repo - then git checkout 0.14.0. (Note here, that the Makefile has to be pacthed, if using minikube!)

make deploy...

Then I reapplied the pvc file again, restored the one previously saved folder with its 3 files and created the restore file:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
  name: restore1
  namespace: default
spec:
  deployment_name: awx
  backup_pvc_namespace: 'default'
  backup_dir: '/backups/tower-openshift-backup-2021-10-13-04:55:33'
  backup_pvc: 'awx-backup-claim'

and applied it. That created me back my postgre pod and awx pods and installed newest AWX and restored my data.

Thats all!

One sidenote: The first start I got some connect errors but I dont know where there are from:

2021-10-13 06:02:10,685 INFO     [-] awx.main.wsbroadcast Active instance with hostname awx-67868d4bd4-qbd95 is registered.
2021-10-13 06:02:10,691 WARNING  [-] awx.main.wsbroadcast Adding {'awx-7948bf7847-rgrdf'} to websocket broadcast list
2021-10-13 06:02:10,694 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 0.
2021-10-13 06:02:10,700 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:10,701 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 1.
2021-10-13 06:02:11,702 INFO success: awx-rsyslogd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-10-13 06:02:11,702 INFO success: awx-rsyslogd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-10-13 06:02:15,707 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:15,708 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 2.
2021-10-13 06:02:20,711 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:20,711 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 3.
2021-10-13 06:02:25,714 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:25,719 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 4.
2021-10-13 06:02:30,721 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:30,722 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 5.
2021-10-13 06:02:35,725 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:35,726 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 6.
2021-10-13 06:02:40,728 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:40,729 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 7.
2021-10-13 06:02:45,731 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:45,732 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 8.
2021-10-13 06:02:50,734 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:50,735 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 9.
2021-10-13 06:02:55,738 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:02:55,739 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 10.
2021-10-13 06:03:00,741 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:03:00,743 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 11.
RESULT 2
OKREADY
2021-10-13 06:03:05,746 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 failed: 'Cannot connect to host 172.17.0.6:8052 ssl:False [Connect call failed ('172.17.0.6', 8052)]'.
2021-10-13 06:03:05,747 DEBUG    [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 attempt number 12.
2021-10-13 06:03:10,746 WARNING  [-] awx.main.wsbroadcast Removing {'awx-7948bf7847-rgrdf'} from websocket broadcast list
2021-10-13 06:03:10,748 WARNING  [-] awx.main.wsbroadcast Connection from awx-67868d4bd4-qbd95 to 172.17.0.6 cancelled
RESULT 2
OKREADY
RESULT 2
OKREADY

But everything is working.

cmatsis commented 2 years ago

+1 I am going to subscribe to this issue What are the backup and restore operators for? Is it possible to do a backup / restore from the tool itself? I have not found any documentation, only these operators investigating the CRDs

azhinu commented 2 years ago

I tried to migrate to another cluster with backup/restore roles and found following:

  1. AWX Operator backups are just pg backup in custom format, AWX deployment spec (awx_object) and secrets (secrets.yml). So if we move to another cluster, we actually don't need specs and secrets, except awx-secret-key. that using to decrypt credentials. So we can just manually use pg_restore -U awx tower.db, restore old awx-secret-key secret and then run migration task.
  2. When I tried to use restore role, it wasn't able to restore data successfully. Probably due to errors on pg_restore. I got success only when I completely dropped awx database and restored backup manually. So, my pipeline was:
    1. dropdb awx
    2. pg_restore -eC -U awx -d postgres tower.db. Using postgres db because awx db was early deleted.
    3. kubectl delete pod ${awx deployment pod}
    4. Take a coffee and wait. This way i moved from AWX 19.4 to AWX 20 on another k8s cluster.
    5. We can't just take backup from another instance and restore it. We need to use awx_object and secrets.yml from current instance at least. But probably will get an error while restoring database, if so, drop awx db and make clean restore.
p4rancesc0 commented 2 years ago

@azhinu, your solution looks great. Do you mind trying to explain with some more details some concepts:

-> how did you manage awx_object and secrets.yml exactly ? -> how do you run the migration task ? -> I imagine you copied tower.db into postgres pod and then execute to db drop and the pg_restore... what do you mean for "Using postgres db because awx db was early deleted." -> why do you delete the awx deployment pod ?

Sorry but some points are quite obscure for me. Getting crazy since a couple of days with tower-cli (deprecated), awxkit, operator backup and restore and still can't find a way to migrate awx from the olde cluster runnign 19.3 to a new cluster running 21.0.

Help will be really appreciated.

@devel, some hits like "how to get log files" of backup and restore rules will really help.

BR

azhinu commented 2 years ago

@p4rancesc0

how did you manage awx_object and secrets.yml exactly ?

I used host path volume and managed this files like any other files on host machine. To let backup object discover backup files, I just run backup task, and then I can replace a created backup or add a new directory in backup PV. It's needed to add backup_dir parameter to restore task when using the second way.

how do you run the migration task ?

Just killed AWX pod, migration task always runs on startup.

what do you mean for "Using postgres db because awx db was early deleted."

I mean that there we should to use db named postgres, it's PostgreSQL system database. This will use only for initial connection, then database named awx will be created.

why do you delete the awx deployment pod ?

To let AWX deployment recreate AWX pod to run migration task. Probably we could scale deployment to 0 and then scale to 1, but to kill pod is more easy way.

Sorry but some points are quite obscure for me.

It's okay. Sometimes it's hard to share some points for me.

still can't find a way to migrate awx from the olde cluster runnign 19.3 to a new cluster running 21.0.

You'll may face with issue while mowing to another version. If so, try to run 19.3 on a new cluster, migrate and then upgrade.

p4rancesc0 commented 2 years ago

Hi thanks for the explanation.

I managed to restore the db using AWXbackup but the I messed up everything with secrets[tried to migrate all of them...].

If I understand correctly you stated that the ONLY secret to copy to the new cluster installation is awx-secret-key.

I've the dump in secrets.yml. Which is the best way to use it in the new cluster instance? kubectl edit secret newawx-secret-key -n awx and the edit the: -secret_key

May you have a better way to deal with awx-secret-key.

BR Francesco

azhinu commented 2 years ago

@p4rancesc0 Secrets will be migrated like other AWX data. But to decrypt secrets, AWX needs awx-secret-key. It should be restored with other backup parts, check secrets in k8s secret_key secret. If it wrong, just edit secret with kubectl. Also secret_key can be set with AWX Operator, but I had no success with that. If I need to change AWX secret key, I'll edit secret_key secret, it's the easiest way to do it.

HGS9761 commented 2 years ago

I tried to migrate to another cluster with backup/restore roles and found following:

1. AWX Operator backups are just pg backup in custom format, AWX deployment spec (`awx_object`) and secrets (`secrets.yml`). So if we move to another cluster, we actually don't need specs and secrets, except awx-secret-key. that using to decrypt credentials. So we can just manually use `pg_restore  -U awx tower.db`, restore old `awx-secret-key` secret  and then run migration task.

2. When I tried to use `restore` role, it wasn't able to restore data successfully. Probably due to errors on `pg_restore`. I got success only when I completely dropped `awx` database and restored backup manually. So, my pipeline was:

   1. `dropdb awx`
   2. `pg_restore -eC -U awx -d postgres tower.db`. Using `postgres` db because `awx` db was early deleted.
   3. `kubectl delete pod ${awx deployment pod}`
   4. Take a coffee and wait. This way i moved from AWX 19.4 to AWX 20 on another k8s cluster.

3. We can't just take backup from another instance and restore it. We need to use `awx_object` and `secrets.yml` from current instance at least. But probably will get an error while restoring database, if so, drop `awx` db and make clean restore.

To my suprise (sorry) this worked. I made a backup of a "production" awx and restored it on a test vm.

When I tried to drop the AWX database an error was reported: postgres=# drop database awx; ERROR: database "awx" is being accessed by other users DETAIL: There are 8 other sessions using the database.

I deleted the pod that runs the redis, awx-web, awx-ee and awx-task pod. This resolved it and I could do the pg_restore.

The next step was editing the awx-secret-key. After this I deleted awx-operator-controller-manager pod, the 4 container pod and the postgres pod shortly one after another.

My next attempt will be using an unmanaged postgres database.

Details: operator-version: 0.21.0 k3s version v1.24.3+k3s1 (990ba0e8)

HGS9761 commented 2 years ago

What I did not know the data inside the database is "tightly" related to the kubernetes environment. The Instance Group definition contains a reference to the namespace.. My "source" instance had all containers in the default namespace and I wanted to move it to the awx namespace. See attached screenshots. Looking in awx logfiles in /var/log/containers I noticed errors that some activity was tried on the namespace "default".

{ "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "pods is forbidden: User \"system:serviceaccount:awx:awx\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"", "reason": "Forbidden", "details": { "kind": "pods" }, "code": 403 }

So I ended up editing the Instance Group The error disappeared after that.

instance-groups instance-groups-details

testcluedev commented 2 years ago

Fix Broken AWX

Here is my story: I had to spent 4 days and 4 nights to repair the AWX setup. And it took me combined knowledge of kubernetes, persistent volume, azure disks, postgres and information provided by kind people in this thread to fix the issue. I have less than a year professional experience with k8s, but this problem was 10% k8s, 30% awx and 60% postgres.

My Existing Setup (which I tried to upgrade, messed up and restored):

Problem:

I tried to upgrade awx from existing version 0.20.1 to latest 21.7.0, which requires upgrading awx-controller to version 0.30.0. I followed this document --> https://github.com/ansible/awx-operator#upgrading. I was following Upgrade steps, and also implemented what was mentioned for version 0.14 (deleting deployment, service account, role, rolebinding, etc). I don't think it was issue, but when I restored using awxbackup the postgresql started to throw weird errors such as id column can't be null in table main_instance, if I fixed it then another error popped up with table main_organization, and so on. Basically, postgres DB was messed up. Ultimately I had to restore DB manually using tower.db from backup restore.

Fix will be mentioned further below in this post, but first I want to tell how I got into horrible mess of awx-prod instance not working for 4 days.

Steps I took for Upgrade:

  1. Create a backup ( I thank backups for able to keep my job ). Before starting any kind of upgrade or major change, always take a backup! (sounds cliche but worth repeating like mantra). This will create a pvc (and pv), which will contain 3 files: awx_object, secrets.yml and tower.db. These files will be in /backups/tower-xx-yy-datetime folder and you can use kubectl cp commands to copy them to your local storage. More on this further below..

filename: backup-awx.yaml

---
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
  name: awxbackup-2022-10-19
  namespace: awx
spec:
  deployment_name: awx-prod
  1. Delete All (almost) existing AWX resources

for example:

kubectl get deployments -n awx (get list of deployments in awx namespace)
awx-prod
awx-controller-manager-xx
# then delete those deployments, start with awx-controller
kubectl delete deployment awx-controller-manager-xx
kubectl delete deployment awx-prod
# get list of statefulset (postgres is deployed as statefulset)
kubectl get statefulsets
awx-prod-postgres
kubectl delete statefulsets/awx-prod-postgres

I also deleted existing pvc from azure where postgres was stored, but I made a vhd backup before that, so that if needed I can attach it as disk to another VM and extract the DB if needed. This was additional step, but I think you can resuse the same pvc. I was desperate to try each and everything.

  1. Create & Run 'Kustomize script' ( provide version of awx-operator ) more info here --> https://github.com/ansible/awx-operator#basic-install (although it is mentioned in Basic Install section of awx-operator guide, but wait at least for 5 mins after applying this kustomize script. AWX frontend and postgres pods usually are created automatically, if 5 mins are past and you don't see any postgres pod, then you can create a awx-prod.yaml script (plz find it in Basic Install page) and uncomment "- awx-prod.yaml line")

filename kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  # Find the latest tag here: https://github.com/ansible/awx-operator/releases
  - github.com/ansible/awx-operator/config/default?ref=0.19.0
  # - awx-prod.yaml  

# Set the image tags to match the git version from above
images:
  - name: quay.io/ansible/awx-operator
    newTag: 0.19.0

# Specify a custom namespace in which to install AWX
namespace: awx

kustomize build . | kubectl apply -f -

  1. Create a Restore based on Backup taken in step 1 (do this after 5 mins of running kustomize script)

filename: restore-awx.yaml

---
apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
  name: restore
  namespace: awx
spec:
  deployment_name: awx-prod
  backup_name: awxbackup-2022-mm-dd

this will launch a restore db prod, in next 5-10 mins there will be multiple creation, termination, restarts of pods in awx. Give it 10 mins to stabilize. This is where my problem started. After this step I was expecting to see restored version of my awx, but postgres was showing errors related to duplicate entry, null values, etc. So, please check if your awx is already restored at this point, if yes please skip further steps. If you were unlucky as me, then continue to life-saving material below.

  1. Download Postgresql DB, secrets.xml, awx_object from restore pod these are THE 3 things which you need to migrate or restore complete awx witin same cluster, or migrate to other

List contents inside the restore pod: kubectl exec -it restore-db-management -- /bin/bash -c "ls -l" copy 3 files from new restore pod to your local disk:

kubectl cp awx/restore-db-management:/backups/tower-openshift-backup-xx-yy/secrets.yml secrets.yml
# replace xx-yy above and below with whatever value you have in restore pod under /backups
kubectl cp awx/restore-db-management:/backups/tower-openshift-backup-xx-yy/awx_object awx_object
kubectl cp awx/restore-db-management:/backups/tower-openshift-backup-xx-yy/tower.db tower.db
# DB download failed 2 times, but worked 3rd time, it was around 208MB in size.

After copying all these files DELETE the restore pod or else it might cause interruption in process. kubectl delete awxrestore restore

  1. Scale Down the Replicas to 0
kubectl edit deployment awx-operator-controller-manager 

above command in windows will open deployment definition in notepad, find replicas and set value to 0, save and close it
At this point you should only have postgres pod running, everything else already terminated. Don't worry about deleting any other stuff such as configmaps, secrets, etc, leave them as it is.

kubectl edt deployment awx-prod   

these names could be different for you, check kubectl get deployments

  1. Copy DB into Postgres Pod and Restore using pg_restore

kubectl cp .\tower.db awx/awx-prod-postgres-0:/tmp above command will copy local copy of tower.db into /tmp dir of Postgres pod Enter postgres pod (follow below commands)

kubectl exec -it awx-prod-postgres-0 -- /bin/bash
psql --user awx
\connect postgres
DROP DATABASE awx;  ! --- if this throws error that 7 or 8 connections are using the DB, then you need to run below command
SELECT * FROM pg_stat_activity WHERE datname = ‘awx’;
# above command will tell you 7 or 8 PIDs which are using DB, you need to kill the first one and all other will fall, or you might need to kill them one by one.
SELECT pg_terminate_backend(put_pid_here) FROM pg_stat_activity WHERE datname = 'awx';
# below stop accepting connections to awx DB, remember to revert this back to 'on' after fix, otherwise nothing would work.
ALTER DATABASE awx allow_connections = off;
# now try to drop database awx again, it should work this time
# quit from pg db config mode: \q
# below run pg_restore command using tower.db, my tower.db was copied in /tmp
root@awx-prod-postgres-0:/tmp# pg_restore -eC -U awx -d postgres tower.db

above restoration will take time, 10, may be 20 mins depending on size of your DB, so go for a walk or coffee or burger or watch an episode of Star Trek Voyager (whatever best kills the time), in fact I don't know how long it takes as I fell asleep while waiting .. but when woke up in morning restoration was done.

  1. Scale Up Deployments, Switch on Connections to DB

Go To Step 6 and put the replica value back to 1 (or whatever it was earlier) inside postgres pod (you know how already) ALTER DATABASE awx allow_connections = on;

  1. And ... you did it!

I tried to put as many details but for sure each person have different environment, skill level, thinking process, but I still hope this write-up would help someone.

Manojjasawat commented 1 year ago

How i can move users settings and templates settings from one AWX to an other, my old AWX running on dockers and new AWX deployed on base machine?

HGS9761 commented 1 year ago

Do you mean all the data inside AWX like user accounts etc?

On Fri, Sep 29, 2023, 05:22 Manoj Jasawat @.***> wrote:

How i can move users settings and templates settings from one AWX to an other, my old AWX running on dockers and new AWX deployed on base machine?

— Reply to this email directly, view it on GitHub https://github.com/ansible/awx-operator/issues/489#issuecomment-1740256952, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3DVS2SWWQ5VIH2K3DWCNTX4Y5ITANCNFSM5BYFJIAQ . You are receiving this because you commented.Message ID: @.***>

vivekshete9 commented 5 months ago

I am facing the same issue. I tried creating backcup by spinning up pod for AWXBackup but it keeps crashing and terminating. So in my case its not even crossing first step. Any pointers on this will help. TIA

HGS9761 commented 5 months ago

I assume that you want to backup the data. This resides in the PostgreSQL pod. You can connect to the pod and use the standard PostgreSQL utilities.

IMHO it is best to have the PostgreSQL database outside awx anyway. But perhaps am I just being a control freak😁

On Tue, May 14, 2024, 23:54 Vivek Shete @.***> wrote:

I am facing the same issue. I tried creating backcup by spinning up pod for AWXBackup but it keeps crashing and terminating. So in my case its not even crossing first step. Any pointers on this will help. TIA

— Reply to this email directly, view it on GitHub https://github.com/ansible/awx-operator/issues/489#issuecomment-2111199837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3DVS5SRMJCHGB7BWJBEPDZCKBXRAVCNFSM5BYFJIA2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJRGEYTSOJYGM3Q . You are receiving this because you commented.Message ID: @.***>

vivekshete9 commented 4 months ago

its still the same issue for me even on latest versions of operator (v2.18.0) & awx (v24.5.0). Someone really needs to fix backup & restore documents/the way it works!