ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
13.54k stars 3.35k forks source link

awx-migration-24.3.1 failing #15189

Open skywalkw3r opened 2 weeks ago

skywalkw3r commented 2 weeks ago

Please confirm the following

Bug Summary

Hello when upgrading from 24.0.0 to 24.3.1 i am getting the following error in my migration pod.

  Apply all migrations: auth, conf, contenttypes, dab_rbac, dab_resource_registry, main, oauth2_provider, sessions, sites, social_django, sso
Running migrations:
  Applying dab_rbac.0001_initial...Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
psycopg.errors.InvalidForeignKey: there is no unique constraint matching given keys for referenced table "main_team"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
             ^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/__init__.py", line 177, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/base.py", line 106, in wrapper
    res = handle_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
                         ^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
    state = migration.apply(state, schema_editor)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/migrations/operations/fields.py", line 108, in database_forwards
    schema_editor.add_field(
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/base/schema.py", line 712, in add_field
    self.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/postgresql/schema.py", line 48, in execute
    return super().execute(sql, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/base/schema.py", line 201, in execute
    cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
django.db.utils.ProgrammingError: there is no unique constraint matching given keys for referenced table "main_team"

AWX version

24.3.1

Select the relevant components

Installation method

openshift

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

Perform awx upgrade via operator from 24.0.0 to 24.3.1

Expected results

Migration pod to finish and new version of AWX to be running

Actual results

Migration pod failing

Additional information

No response

skywalkw3r commented 2 weeks ago

Note the last upgrade to 24.0.0 appeared to run the migration pod with no issues.

migration-24.0.0: Operations to perform: Apply all migrations: auth, conf, contenttypes, dab_resource_registry, main, oauth2_provider, sessions, sites, social_django, sso Running migrations: Applying dab_resource_registry.0001_initial... OK Applying dab_resource_registry.0002_remove_resource_id... OK Applying dab_resource_registry.0003_alter_resource_object_id... OK Applying main.0190_alter_inventorysource_source_and_more... OK

Also tested a brand new deployment via AWX operator and latest version deploys with no issues.

TheRealHaoLiu commented 2 weeks ago

is this problem still reproducible in your environment? do you have a backup of your database pre-upgrade?

AlanCoding commented 2 weeks ago

I'd expect this message to happen if somehow the unique constraint for the team model's id field got messed up. This is happening as a ForeignKey is added to the team model, but this should not use the name field, which is not unique. Maybe somehow some change made it think the name field is the primary_key.

skywalkw3r commented 2 weeks ago

is this problem still reproducible in your environment? do you have a backup of your database pre-upgrade?

Yes it is. I was able to restore from backup and get my instance upgraded to 24.2.0 however it appears even with the version i am on trying to upgrade to 24.3.1 has the same issue.

skywalkw3r commented 2 weeks ago

I'd expect this message to happen if somehow the unique constraint for the team model's id field got messed up. This is happening as a ForeignKey is added to the team model, but this should not use the name field, which is not unique. Maybe somehow some change made it think the name field is the primary_key.

Interesting, any ideas on how to resolve? I'm looking at the postgres db and i see the table in question but nothing looks out of the ordinary. Could i manually set the primary_key back to id? Interesting thing is i don't see a unique constraint listed at the bottom of the table output.

image

Another fresh install of AWX main_team table for reference: image

skywalkw3r commented 2 weeks ago

FYI - I "resolved" the issue by following these steps. Probably not the cleanest way but i needed to get my instance up and running with the latest release.

  1. psql dump fresh install of AWX postgres DB table main_team pg_dump --table main_team awx > main_team_bkp
  2. oc rsh into postgres pod on broken instance oc project ansible-awx oc rsh awx-prod-postgres-15-0
  3. take psql dump backup of existing DB (the one having issues) pg_dump awx > awx_bkp
  4. import psql dump of fresh main_team table psql awx < main_team_bkp
  5. perform awx-manage migrate command on web pod
  6. reload web/task pods and everything seems happy now.

Any thoughts on if this is a terrible idea or if this should be ok?

AlanCoding commented 2 weeks ago

I'm almost completely convinced that something messed up the indices of your team table. I think you have a great strategy for the immediate issue. My only concern would be whether the constraints of other tables also got corrupted. Looking into the obvious tools for this, it seems we already have django-extensions installed so awx-manage sqldiff main dab_rbac should work, but it gives a lot of junk output, here is what I get in a fresh DB:

https://gist.github.com/AlanCoding/dcd9e67e02423e5524450b8150a4d6d8

So you could run that and cross-reference against mine. If you have other tables which dropped constraints, I wonder if it might still be obvious enough you can compare and see it.