geerlingguy / tower-operator

DEPRECATED: This project was moved and renamed to: https://github.com/ansible/awx-operator
82 stars 34 forks source link

How to restore Postgres database into new installation #35

Closed nicolaibaralmueller closed 4 years ago

nicolaibaralmueller commented 4 years ago

Hi again Jeff,

Sorry to spam with issues at the moment. Solving obstacles day by day.

I've now managed to get awx running and with much faster response times then our local docker environment setup.

However, how do we migrate the existing database (postgres10) to the kubernetes environement.

The namespace is configured with persistentVolumeClaim and can't find the mount folder on the host. I was hoping to just copy the postgres data folder content like we do right now when upgrading awx versions.

Any easy way to accomplish this scenario?

Many thanks!

nicolaibaralmueller commented 4 years ago

Update:

Found the hostpath in dashboard: /var/snap/microk8s/common/default-storage/awx-postgres-awx-postgres-0-pvc-9236f794-8799-49f3-aa04-c293a1603654

I will try to replace the files with our postgres data and update this thread.

nicolaibaralmueller commented 4 years ago

Update:

Postgres password needs to be set with tower_postgres_pass variable. Trying tomorrow.

nicolaibaralmueller commented 4 years ago

Update:

  1. Set tower_postgres_pass
  2. Install awx
  3. microk8s.stop
  4. cp -rf /backup/data/* /var//var/snap/microk8s/common/default-storage/awx-postgres-awx-postgres-0-pvc-ff37cf7f-8a3c-4d17-8098-b04a5d101849/data
  5. Set pg_hba.conf and postmaster.pid
  6. microk8s.start

Unfortunately the restore did not work.

Web log:

Tue Mar 17 12:59:57 2020 - *** HARAKIRI ON WORKER 2 (pid: 129, try: 1) ***
Tue Mar 17 12:59:57 2020 - HARAKIRI !!! worker 2 status !!!
Tue Mar 17 12:59:57 2020 - HARAKIRI [core 0] 10.1.90.1 - POST /api/login/ since 1584449876
Tue Mar 17 12:59:57 2020 - HARAKIRI !!! end of worker 2 status !!!
DAMN ! worker 2 (pid: 129) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 2 (new pid: 214)

Postgres log:

PostgreSQL Database directory appears to contain a database; Skipping initialization

2020-03-17 12:55:50.369 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-03-17 12:55:50.369 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2020-03-17 12:55:50.806 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-03-17 12:55:51.968 UTC [27] LOG:  database system was interrupted; last known up at 2020-03-15 23:59:55 UTC
2020-03-17 12:55:53.545 UTC [27] LOG:  database system was not properly shut down; automatic recovery in progress
2020-03-17 12:55:53.743 UTC [27] LOG:  redo starts at E/61F7B980
2020-03-17 12:55:53.754 UTC [27] LOG:  invalid record length at E/620181C8: wanted 24, got 0
2020-03-17 12:55:53.754 UTC [27] LOG:  redo done at E/62018190
2020-03-17 12:55:53.754 UTC [27] LOG:  last completed transaction was at log time 2020-03-16 00:00:51.940464+00
2020-03-17 12:55:56.007 UTC [1] LOG:  database system is ready to accept connections
2020-03-17 12:55:59.767 UTC [34] LOG:  incomplete startup packet
2020-03-17 12:55:59.773 UTC [35] LOG:  incomplete startup packet
2020-03-17 12:58:42.191 UTC [48] LOG:  unexpected EOF on client connection with an open transaction
2020-03-17 12:59:57.262 UTC [63] LOG:  unexpected EOF on client connection with an open transaction

That makes this setup unusable for us. We need to be able to restore. I hope this will become supported in the future.

geerlingguy commented 4 years ago

Note that copying the data dir directory can (and often does, ime) result in weird behavior with databases, especially if you're moving between minor versions of Postgres.

Instead, I recommend doing a dump-and-restore (e.g. https://www.linode.com/docs/databases/postgresql/how-to-back-up-your-postgresql-database/#one-time-sql-dump).

Can you see if that works better?

Restoring databases (or even backing up using the operator itself) is not yet 'officially' supported in this operator, however—I haven't tried it myself.

nicolaibaralmueller commented 4 years ago

Note that copying the data dir directory can (and often does, ime) result in weird behavior with databases, especially if you're moving between minor versions of Postgres.

Instead, I recommend doing a dump-and-restore (e.g. https://www.linode.com/docs/databases/postgresql/how-to-back-up-your-postgresql-database/#one-time-sql-dump).

Can you see if that works better?

Restoring databases (or even backing up using the operator itself) is not yet 'officially' supported in this operator, however—I haven't tried it myself.

Thanks @geerlingguy . I tried many things but none was succesfull. Tried with different pg_dump commands. Tried with pg_restore and psql.

kubectl exec -i --namespace=awx awx-postgres-0 -- pg_restore -U awx --clean -Fc /backup/awx_db.dump

Restore fails with all methods, but the above command was able to show the content but not usable.

Also it is not possible to stop individual pods so can't dump the db before restoring.

AWX upgrade with your operator is however awesome. Takes 5 minutes to upgrade from 9.1.0 to 9.3.0.

I am a bit worried that if anything goes wrong it would be difficult to restore. Our database is almost 1GB worth of data.

Do you have any plans in regards to additional features for the operator?

geerlingguy commented 4 years ago

@nicolaibaralmueller - Right now this operator is more of a 'useful experiment', especially as a demo of the Operator SDK + Ansible, but I hope that it can grow into more of a more supported way to run Tower (and by extension AWX) in K8s/OCP.

The issue queue sums up most of the other features that I'm pursuing in the short term for this operator, though I'm trying to see if and when I can get more help maybe from Ansible/Red Hat in implementing the changes.

stale[bot] commented 4 years ago

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

Please read this blog post to see the reasons why I mark issues as stale.

geerlingguy commented 4 years ago

Now that the operator is being maintained in the Ansible namespace, I am not going to maintain this project anymore (as I consider it a historic artifact which was used to as a base for the ansible-namespace version, and I'm redirecting all issues and PRs to the Ansible-maintained version: https://github.com/ansible/awx-operator