ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.26k stars 632 forks source link

AWX migration to external postgres fails #1407

Open jonathon2nd opened 1 year ago

jonathon2nd commented 1 year ago

Please confirm the following

Bug Summary

When attempting to migrate from AWX on vm to AWX in k8s with an externally managed postgres cluster, it fails.

AWX Operator version

awx-operator:2.1.0

AWX version

latest

Kubernetes platform

kubernetes

Kubernetes/Platform version

v1.24.8 +rke2r1

Modifications

no

Steps to reproduce

Deploy

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  secret_key_secret: awx-secret-key
  old_postgres_configuration_secret: awx-old-postgres-configuration
  postgres_configuration_secret: awx-postgres-configuration
  projects_persistence: true
  projects_storage_class: linstor-replica-three
  projects_storage_size: 10Gi
  projects_storage_access_mode: ReadWriteOnce
  admin_password_secret: awx-admin-password
  admin_user: admin
  ingress_annotations: |
    "kubernetes.io/ingress.class": "traefik"
    "traefik.ingress.kubernetes.io/router.entrypoints": "websecure"
    "traefik.ingress.kubernetes.io/router.tls": "true"
  ingress_path: /
  ingress_path_type: Prefix
  ingress_tls_secret: example-com-cf
  hostname: awx.example.com
---
apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: acid-awx-pooler.awx-postgres
  port: "5432"
  database: awx
  username: awx
  password: password
  type: unmanaged
  sslmode: require
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
  name: awx-old-postgres-configuration
  namespace: awx
stringData:
  host: "ip"
  port: "5432"
  database: awx
  username: awx
  password: password
type: Opaque

Expected results

I expect it to use the already deployed postgres cluster

Actual results

Fails to work. See log file

Which leads me to: https://github.com/ansible/awx-operator/blob/devel/roles/installer/tasks/migrate_data.yml#L31

It seems like when you migrate, it is expecting to manage postgres?

Additional information

https://github.com/ansible/awx-operator/issues/1240

Creating new issue as requested

Operator Logs

awx-operator-controller-manager-58b5b7698b-4c2x6_awx-manager.log

Tatsu-Kishi commented 1 year ago

I'm encountering the same issue. Trying to migrate from an old awx 17 VM/psql to an Operator-controlled deployment with external psql (because no to databases on k8s...).

awx-operator:2.3.0 old psql: postgres:12 new psql: 14.8 (Ubuntu package)

AWX:

kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  service_type: ClusterIP
  ingress_type: ingress
  ingress_class_name: nginx
  ingress_annotations: |
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "awx-kube-sticky"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
  hostname: awx1.[example.com]
  secret_key_secret: awx-secret-key
  admin_user: admin
  admin_email: administrator@[example.com]
  admin_password_secret: awx-admin-password
  postgres_configuration_secret: awx-postgres-configuration
  ipv6_disabled: true
  image_pull_policy: Always

awx-old-postgres-configuration:

apiVersion: v1
kind: Secret
metadata:
  name: awx-old-postgres-configuration
  namespace: awx
stringData:
  host: "awx.[example.com]"
  port: "5432"
  database: "db"
  username: "user"
  password: "pw"
type: Opaque

awx-postgres-configuration:

apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: postgresql.[example.com]
  port: "5432"
  database: db2
  username: user2
  password: pw2
  sslmode: prefer
  type: unmanaged
type: Opaque

operator log: op-log.txt

As a workaround, does awx/the operator care if I do the migration to a managed DB via a copy of my secret with a different name and set to managed, move that db to my intended db server and then just change the secret back to my current secret, hopefully causing a deployment update?

Zokormazo commented 1 year ago

@rcarrillocruz I think you got this report from another sources, are you tracking this one?

jonathon2nd commented 1 year ago

I have manually setup new AWX. After finally getting AWX up and running with external postgres, I attempted another migration and it failed. So just manually setup templates, creds and inventory+project.

rchaud commented 11 months ago

@jonathon2nd I have a AWX in 9.3.0 which I managed to migrate to AWX Operator and run version 23.3.0. I did have to jump thru different versions but I found it easier to have a Bitnami Postgresl deployed and run the database as external for the new AWX, this way I can manage to do a sql import from a dump of the old database.

So, try this:

This method has work for me multiple times. I even tried using Bitnami Postgresl HA but pgpool doesnt play nicely with django cursor.py.

rooftopcellist commented 11 months ago

This looks similar to what was happening here in the pulp/galaxy operator:

@Zokormazo this what the issue you were thinking of that @rcarrillocruz was working on.

@jonathon2nd The tricky thing here though is that the migration logic makes the assumption that you are migrating to a managed db in a k8s pod:

I think that to add support for using this migration logic for external postgres instances, we would need to do something like:

There are some tricky issues to work through though:

These are just some thoughts and recollections from previous experimenting, not to be taken as fact. But if you want to take a crack at implementing this, a PR would certainly be welcome.

At the moment, if you migrate clusters and use an external postgres, it is on the user to migrate the data from the old postgres to the new one.

SwiperNo commented 7 months ago

Is this issue still open?

discostur commented 5 months ago

Just tried to install AWX via operator with external postgres. It seems like AWX does not create any tables with external managed postgres. The awx operator creates the migration jobs but in my fresh bootstrapped awx database i have no tables and awx / operator doesn't create any tables!

You should either create the tables you need if the don't exist or expand your documentation with a DB bootstrap file ...

Regards, Kilian

bartowl commented 4 months ago

I have faced exact same problem yesterday, when just trying to migrate from the built-in managed postgresql database to external unmanaged database. This is because the code is currently written in such a way, that the target for pg_restore is always a managed database inside k8s container where both pg_dump and pg_restore are called.

There are currently 3 issues that would beed to be fixed in order for this to work:

  1. always specify -h parameter (host) for both pg_dump and pg_restore and do not do implicit trust auth against target in pg_restore (simple)
  2. maybe improve handling of credentials, as current method relies on environment variables in postgresql pod (especially old PG password). This can be solved either by re-creating postgresql statefulset (the only way that it adds new env variable after specyfing old_postgres_configuration_secret), or use rendered .pgpass file inside container to do the authentication and remove it afterwards
  3. for the moment either the old or the current postgresql database HAS TO BE a managed database, as all the code runs against a k8s container. I do not see an easy way to handle scenatios when migrating from one unmanaged to another unmanaged database.

In terms of points 1-2 i can help and even provide a PR that will fix it should this be wanted.

Actually when I think about this once again, even point 3 would be doable with additional management pod based on postgresql image having both pg_dump and pg_restore. It might even solve problems 1-2 as well, as it could have all needed env variables since it needs to be created anyway, and it would have to connect to both databases since none of them would be running inside of it.

stevenahmet commented 4 months ago

Just so I'm clear, are you saying that there is no way to spin up a new AWX instance that uses a new external postgresql install (DB/user/pass created)? I'd have to create a new AWX instance with a managed database, then do a DB migration to the external DB?

rchaud commented 4 months ago

It seems that you guy are really over complicating things.

Look do this. Create a job or deployment with pgsl in the same namespace as AWX with persistent volume. Open a shell in the container, dump the awx database. Remove awx deployment. Deploy postgres from Bitnami or your favorite. Dont forget to create a secret that awx will use for the external connection. Connect to the container used for the dump, connect to the new bitnami db and restore the db. Create awx with new external db. Everything should be restored as normal.

I have all this process automated to dump my db horly using k8s cronjob and uplod to S3 and for restores I can pull from S3 and restore the db using a job in k8s.

I have only restored 1 or 2 times when the pvc became corrupted at the storage level.

Please reach out if you need some guidance.

BartOpitz commented 4 months ago

Just so I'm clear, are you saying that there is no way to spin up a new AWX instance that uses a new external postgresql install (DB/user/pass created)? I'd have to create a new AWX instance with a managed database, then do a DB migration to the external DB?

Actually exactly the opposite. It is not possible to migrate to external pg database with the built-in workflow in AWX operator. I think it should be just fine to setup a completely new instance with external postgresql but did not look into that part of code. For sure however having a localy awx-managed postgresql database it is not possible to use the built in migration workflow to migrate to an external database.

And what @chinochao stated is correct - this is what I actyually did. I execed into the running managed postgresql database, manually started pg_dump ... | pg_restore ... to migrate to external database and then updated resource definition to point only to external database. This ticket is however about migration to external database with built-in workflow failing.

As I already said - it should be possible without much effort to migrate between internal and external database in any direction. Just one -h parameter to pg_restore would fix it. In order to be able to migrate between 2 external databases though one would need extra job container and a bit more code changes...