ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.23k stars 625 forks source link

How do I upgrade AWX from 15.0.0 to 19.x.x? #402

Open k8s42 opened 3 years ago

k8s42 commented 3 years ago

I installed AWX 15 using AWX Operator v0.6.0. There is no documentation that covers the AWX upgrade process from v15 to v19. I'm using Postgres PaaS.

felipe4334 commented 3 years ago

I had a playbook create awx-old-postgres-configuration and awx-secret-key secrets on K8. Make sure all the information is base64 encoded.

---
# Postgres Configuration
apiVersion: v1
kind: Secret
metadata:
  name: awx-old-postgres-configuration
  namespace: awx
data:
  host: aW5mZGV2b2NpbGF #server name is base64 encode
  port: NTg== #5432
  database: YXd4 #awx
  username: YXd4 #awx
  password: RG9u # password in base64 encode
type: Opaque

---
# AWX Secret Key
apiVersion: v1
kind: Secret
metadata:
  name: awx-secret-key
  namespace: awx
data:
  secret_key: Tm90aGluZ0= # base64 encoded
type: Opaque

Then on the AWX deployment playbook I added the old_postgres_configuration_secret: awx-old-postgres-configuration line inside the spec:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  hostname: awx.local
  old_postgres_configuration_secret: awx-old-postgres-configuration
felipe4334 commented 3 years ago

The base64 encoded data does not actually decode into anything. I used it as an example. Make sure port 5432 is mapped from the server to the container on the old AWX-Postgress installation.

tchellomello commented 3 years ago

@k8s42 could you please clarify what you mean by an upgrade? Are you referring to migrate de database from your PaaS solution to the one managed by awx-operator? If so, then yes, you would need to follow the steps at https://github.com/ansible/awx-operator/blob/devel/docs/migration.md

Now, if you want to keep the external same PostgreSQL PaaS solution but just want to update the AWX version, then you would need to create a secret pointing to your external database as documented at https://github.com/ansible/awx-operator#external-postgresql-service

colixxx commented 3 years ago

@tchellomello Hi. I faced a similar problem. I am trying to upgrade my docker + extPostgres installation from version 17.0.0 to latest (19.2.2). I have described secrets with secketkey and naming as specified here. I want to leave the configuration with extPostgres, but start using the operator. But after starting awx-web says:

021-07-30 07:58:34,610 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-07-30 07:58:34,610 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-07-30 07:58:35,483 ERROR    [-] awx.main.wsbroadcast AWX is currently installing/upgrading.  Trying again in 5s...
2021-07-30 07:58:39,806 INFO     [8e187616f03244ed91a3f0c7413c8dd1] awx.analytics.performance request: <WSGIRequest: GET '/'>, response_time: 0.043s
[pid: 27|app: 0|req: 66/91] 10.131.6.1 () {50 vars in 2544 bytes} [Fri Jul 30 07:58:39 2021] GET / => generated 0 bytes in 45 msecs (HTTP/1.1 302) 5 headers in 220 bytes (1 switches on core 0)
10.131.6.1 - - [30/Jul/2021:07:58:39 +0000] "GET / HTTP/1.1" 302 5 "https://awx-awx.apps.sgm.sberdevices.ru/migrations_notran/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15" "10.3.0.44"
2021-07-30 07:58:39,989 INFO     [c6fd437cb95a4ba48554b3d0fc916b8a] awx.analytics.performance request: <WSGIRequest: GET '/migrations_notran/'>, response_time: 0.074s
[pid: 27|app: 0|req: 67/92] 10.131.6.1 () {50 vars in 2580 bytes} [Fri Jul 30 07:58:39 2021] GET /migrations_notran/ => generated 2016 bytes in 75 msecs (HTTP/1.1 200) 9 headers in 434 bytes (1 switches on core 0)
10.131.6.1 - - [30/Jul/2021:07:58:39 +0000] "GET /migrations_notran/ HTTP/1.1" 200 2016 "https://awx-awx.apps.sgm.sberdevices.ru/migrations_notran/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15" "10.3.0.44"
2021-07-30 07:58:40,896 INFO exited: wsbroadcast (exit status 0; expected)
2021-07-30 07:58:40,896 INFO exited: wsbroadcast (exit status 0; expected)
2021-07-30 07:58:41,899 INFO spawned: 'wsbroadcast' with pid 331
2021-07-30 07:58:41,899 INFO spawned: 'wsbroadcast' with pid 331
2021-07-30 07:58:42,901 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-07-30 07:58:42,901 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-07-30 07:58:43,787 ERROR    [-] awx.main.wsbroadcast AWX is currently installing/upgrading.  Trying again in 5s...
2021-07-30 07:58:49,230 INFO exited: wsbroadcast (exit status 0; expected)
2021-07-30 07:58:49,230 INFO exited: wsbroadcast (exit status 0; expected)
2021-07-30 07:58:50,195 INFO     [02e75e7d4fd345eda5af088109f410ef] awx.analytics.performance request: <WSGIRequest: GET '/'>, response_time: 0.043s
2021-07-30 07:58:50,198 INFO spawned: 'wsbroadcast' with pid 335
2021-07-30 07:58:50,198 INFO spawned: 'wsbroadcast' with pid 335

aws-task says:

[wait-for-migrations] Waiting for database migrations...
[wait-for-migrations] Attempt 1 of 10
[wait-for-migrations] Waiting 0.5 seconds before next attempt
[wait-for-migrations] Attempt 2 of 10
[wait-for-migrations] Waiting 1 seconds before next attempt
[wait-for-migrations] Attempt 3 of 10
[wait-for-migrations] Waiting 2 seconds before next attempt
[wait-for-migrations] Attempt 4 of 10
[wait-for-migrations] Waiting 4 seconds before next attempt
[wait-for-migrations] Attempt 5 of 10
[wait-for-migrations] Waiting 8 seconds before next attempt
[wait-for-migrations] Attempt 6 of 10
[wait-for-migrations] Waiting 16 seconds before next attempt
[wait-for-migrations] Attempt 7 of 10
[wait-for-migrations] Waiting 30 seconds before next attempt
[wait-for-migrations] Attempt 8 of 10
[wait-for-migrations] Waiting 30 seconds before next attempt
[wait-for-migrations] Attempt 9 of 10
[wait-for-migrations] Waiting 30 seconds before next attempt
[wait-for-migrations] Attempt 10 of 10
[wait-for-migrations] ERROR: Database migrations not applied

If I try to start the migration myself from the awx-task terminal, it gives me this error:

sh-4.4$ awx-manage migrate
Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
2021-07-30 08:01:22,256 INFO     [-] awx.main.migrations deleted (133248, {'main.ActivityStream_unified_job': 0, 'main.UnifiedJob_dependent_jobs': 57626, 'main.UnifiedJob_notifications': 24, 'main.UnifiedJob_labels': 0, 'main.UnifiedJob_credentials': 2930, 'main.JobLaunchConfig_credentials': 0, 'main.ActivityStream_inventory_update': 24, 'main.InventoryUpdateEvent': 63854, 'main.InventoryUpdate': 2930, 'main.JobLaunchConfig': 2930, 'main.UnifiedJob': 2930})
2021-07-30 08:01:22,257 INFO     [-] awx.main.migrations Deleted 2930 custom inventory script sources.
  Applying main.0137_custom_inventory_scripts_removal_data...Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/__init__.py", line 164, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/core/management/base.py", line 83, in wrapped
    res = handle_func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/core/management/commands/migrate.py", line 232, in handle
    post_migrate_state = executor.migrate(
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/migrations/executor.py", line 117, in migrate
    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/migrations/executor.py", line 245, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/migrations/migration.py", line 124, in apply
    operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/migrations/operations/special.py", line 190, in database_forwards
    self.code(from_state.apps, schema_editor)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/migrations/_inventory_source.py", line 105, in delete_custom_inv_source
    ct, deletions = InventorySource.objects.filter(source='custom').delete()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/query.py", line 710, in delete
    collector.collect(del_query)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/deletion.py", line 224, in collect
    field.remote_field.on_delete(self, field, sub_objs, self.using)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/deletion.py", line 15, in CASCADE
    collector.collect(sub_objs, source=field.remote_field.model,
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/django/db/models/deletion.py", line 224, in collect
    field.remote_field.on_delete(self, field, sub_objs, self.using)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/utils/polymorphic.py", line 18, in SET_NULL
    return models.SET_NULL(collector, field, sub_objs.non_polymorphic(), using)
AttributeError: 'QuerySet' object has no attribute 'non_polymorphic'

Last Operator log:

--------------------------- Ansible Task Status Event StdOut  -----------------

PLAY RECAP *********************************************************************
localhost                  : ok=54   changed=1    unreachable=0    failed=0    skipped=38   rescued=0    ignored=0   

-------------------------------------------------------------------------------

I tried updating awx step-by-step version 17.0.0-> 18.0.0-> 19.0.0-> 19.1.0-> 19.2.0-> 19.2.1-> 19.2.2 but that didn't work.

My cluster:

OpenShift Version
4.5.0-0.okd-2020-10-15-235428
Kubernetes Version
v1.18.3

My configs: awx.yaml:

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  image: quay.io/ansible/awx
  image_version: 19.1.0
  service_type: nodeport
  ingress_type: none
  hostname: awx.cluster.com
  deployment_type: awx
  external_database: true
  postgres_configuration_secret: awx-postgres-configuration

awx-postgres-configuration.yaml:

---
apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: "172.16.0.28"
  port: "5432"
  database: "ansible"
  username: "ansible"
  password: ""
  sslmode: "prefer"
  type: "unmanaged"
type: Opaque

awx-secret-key.yaml

apiVersion: v1 
kind: Secret 
metadata:
  name: awx-secret-key 
  namespace : awx
stringData:
   secret_key: <awx-secret>
type: Opaque
renatogalera commented 3 years ago

Same problem here.

Upgrade AWX 17.1.0 to 19.3.0

I still migrated only the database.

[wait-for-migrations] Waiting 30 seconds before next attempt

20/08/2021 18:00:47 [wait-for-migrations] Attempt 29 of 30

20/08/2021 18:00:50 [wait-for-migrations] Waiting 30 seconds before next attempt

20/08/2021 18:01:20 [wait-for-migrations] Attempt 30 of 30

20/08/2021 18:01:23 [wait-for-migrations] ERROR: Database migrations not applied
eselvam commented 3 years ago

I am also facing same issue. Once the operator installed, it is automatically starts to create all 4 container in awx name space. It never looks up for the awx.yaml which has the config.

operator 0.13.0 and awx: latest

spisakni commented 3 years ago

Also have this error

spisakni commented 3 years ago

Is anyone looking into this... I think we're all dead in the water and already followed all the guides.

yurtesen commented 3 years ago

RedHat engineer said Who told you AWX should be used in production? implying AWX should not be used in production: https://github.com/ansible/awx/issues/10166#issuecomment-839680079 and FAQ says it can't be upgraded. For more detail see: https://github.com/ansible/awx/issues/9540#issuecomment-845483459

So I doubt you will get help from developers for this problem. Upgrade is officially not supported (based on the FAQ).

mnhan3 commented 3 years ago

I'm trying to upgrade from 17.1 docker to 19.3 openshift and its stuck on the db migrate. Running the awx-manage migrate manually throws an error applying 0144_event_partitions.

---- edit--- I was able to get the migration to work after manually going version by version up to 19.2.1 which didn't include the 0144 migration. Next, install and new 19.2.1 installation via the operator with an internal postgresql db. did a pg_dump of the db and loaded it up inside and external db. set the external db to log all sql for database awx. proceeded with the upgrade and grab all the sql for main_0144_event_partitions up to and include the insert of the event into to the table noting it had did the migration of that event. Then I manually apply the sql to my db and proceeded to upgrade it to 19.2.2 without an issue. I assume it will finish to 19.3 if I went ahead with the migration.

The problem I'm facing now is that the migrated 19.2.2 instance of awx refuse to run any job using previously created inventories and none of my git repos will sync on any of the projects. I keep getting an error about execution environment missing. Newly created inventories will let me run adhoc commands against them using the awx_ee but not existing inventories. Creating new projects doesn't seem to work like newly created inventories.

kylecurtis-od commented 2 years ago

from what I know of how 19.x behaves, it must run on an EE. So all your jobs or projects that are looking for an instance group and "tower" inside of it, aren't going to find it. I would be curious what your old job templates have for instance group. Also try changing one of those job templates to your EE ( assuming you allowed it to create an EE pod ) and make sure to pick the default instance inside that EE as well. it should run. Do not pick "control plane" as it's not intended to execute jobs.

mnhan3 commented 2 years ago

EE works well for a fresh install of awx. Projects and everything works out of the box in a fresh install. There's nothing special in the migrated projects or template. Even creating a new template in an existing projects doesn't work, I can set the ee to default, to any version and still it all fails. Creating a new project and trying to sync a git repository even fails on an upgraded instance. The only thing that works so far is creating a new inventory and running a simple adhoc ping . That seem to work. Can't use an existing inventory with adhock ping. So its very baffling. We're going to try to export all the workflow and import it into a new install and see if that works.

dustinmhorvath commented 2 years ago

When operator rolled out, and the old playbook installation method was discarded, I found it much easier to just go into the database, do a dump with something like pg_dump > somefile, then restore it into the database of the newly created instance. Trying to get awx to do it in-place was more headache than it was worth.

jbradberry commented 2 years ago

The fix for the polymorphic setnull thing landed in 19.3.0, from PR https://github.com/ansible/awx/pull/10633.

jbouse commented 2 years ago

I recently performed a similar upgrade from on old AWX 6.1.0.0 docker + ext RDS DB to AWX 20.1.0 Operator 0.20.0 + ext RDS DB. From my experience the database migration was not performed by the Operator so I did not define the old_postgres_configuration_secret and instead just setup the <Resource Name>-postgres-configuration secret pointing to the ext RDS DB ensuring that type: unmanaged was included in the data. I then performed the pg_dump/pg_restore between the old and new RDS DB instances manually myself before deploying the AWX manifest to the cluster.

As I'd looked through the CRD installer role related to database migration I noted that it did not mention the DB host for the pg_restore but it did for the pg_dump which seemed to indicate that it would only handle a DB migration from ext DB to integrated pod DB. Any actual app DB migrations appeared to me to be inside AWX startup itself and not handled by the operator so getting the ext RDS DB data migrated (or if not creating a new ext DB instance setting the secret to the existing DB to upgrade in place) and then starting AWX seemed to work for me. I created a new DB instance as I needed to run them in parallel for a time to allow for testing.

eselvam commented 2 years ago

Thanks for the information. Is the resource_name is awx? could you please clarify. Thanks & Regards,Selvam E.

On Tuesday, April 5, 2022, 05:48:32 AM GMT+5:30, Jeremy T. Bouse ***@***.***> wrote:  

I recently performed a similar upgrade from on old AWX 6.1.0.0 docker + ext RDS DB to AWX 20.1.0 Operator 0.20.0 + ext RDS DB. From my experience the database migration was not performed by the Operator so I did not define the old_postgres_configuration_secret and instead just setup the -postgres-configuration secret pointing to the ext RDS DB ensuring that type: unmanaged was included in the data. I then performed the pg_dump/pg_restore between the old and new RDS DB instances manually myself before deploying the AWX manifest to the cluster.

As I'd looked through the CRD installer role related to database migration I noted that it did not mention the DB host for the pg_restore but it did for the pg_dump which seemed to indicate that it would only handle a DB migration from ext DB to integrated pod DB. Any actual app DB migrations appeared to me to be inside AWX startup itself and not handled by the operator so getting the ext RDS DB data migrated (or if not creating a new ext DB instance setting the secret to the existing DB to upgrade in place) and then starting AWX seemed to work for me. I created a new DB instance as I needed to run them in parallel for a time to allow for testing.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>