ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. πŸ€–
https://www.github.com/ansible/awx
Apache License 2.0
1.23k stars 623 forks source link

Upgraded to 2.13.1 - awx-task pod stuck "Waiting for database migrations..." #1777

Open dark-vex opened 6 months ago

dark-vex commented 6 months ago

Please confirm the following

Bug Summary

Hello πŸ‘‹ , I have upgraded AWX Operator to 2.13.1 using the helm chart but awx-task pod is stuck in "Waiting for database migrations..." phase.

AWX Operator version

2.13.1

AWX version

24.0.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

v1.27.8+k3s2

Modifications

no

Steps to reproduce

I don't have specific steps to reproduce, I have only upgraded from AWX 23.9.0 to 24.0.0 using the helm chart

Expected results

Migration job to complete successful and having AWX instance up&running

Actual results

Actual result is having the awx-task pod stuck in Init:0/3 state with the logs of init-database container looping:

[wait-for-migrations] Waiting 30 seconds before next attempt
[wait-for-migrations] Attempt 3284
[wait-for-migrations] Waiting 30 seconds before next attempt
[wait-for-migrations] Attempt 3285

Additional information

Looking at pod status "apparently" the job for migrating the DB did run successful:

 ➜ k get po -n awx
NAME                                              READY   STATUS      RESTARTS        AGE
awx-backup-28480260-t4szn                         0/1     Completed   0               19d
awx-backup-28490340-nwz6q                         0/1     Completed   0               12d
awx-backup-28500420-cxsx7                         0/1     Completed   0               5d11h
awx-migration-24.0.0-wxwm5                        0/1     Completed   0               37h
awx-operator-controller-manager-67c5f4d45-wsbhn   2/2     Running     2 (5h16m ago)   37h
awx-postgres-15-0                                 1/1     Running     0               37h
awx-task-7ff5947d5c-qkf7s                         0/4     Init:0/3    0               37h
awx-web-8577b8fc55-c4dh2                          3/3     Running     0               37h

But looking at awx-migration job logs it seems the migration got somehow finished earlier so it did not complete:

This is also confirmed by running /bin/bash -c "! awx-manage showmigrations | grep '\[ \]'" inside awx-task pod (init-database container)

 ➜ k exec -it awx-task-7ff5947d5c-qkf7s -n awx -c init-database -- bash

bash-5.1# /bin/bash -c "! awx-manage showmigrations | grep '\[ \]'"
 [ ] 0001_initial
 [ ] 0002_alter_permission_name_max_length
 [ ] 0003_alter_user_email_max_length
 [ ] 0004_alter_user_username_opts
 [ ] 0005_alter_user_last_login_null
 [ ] 0006_require_contenttypes_0002
 [ ] 0007_alter_validators_add_error_messages
 [ ] 0008_alter_user_username_max_length
 [ ] 0009_alter_user_last_name_max_length
 [ ] 0010_alter_group_name_max_length
 [ ] 0011_update_proxy_permissions
 [ ] 0012_alter_user_first_name_max_length
 [ ] 0001_initial
 [ ] 0002_v310_copy_tower_settings
 [ ] 0003_v310_JSONField_changes
 [ ] 0004_v320_reencrypt
 [ ] 0005_v330_rename_two_session_settings
 [ ] 0006_v331_ldap_group_type
 [ ] 0007_v380_rename_more_settings
 [ ] 0008_subscriptions
 [ ] 0009_rename_proot_settings
 [ ] 0010_change_to_JSONField
 [ ] 0001_initial
 [ ] 0002_remove_content_type_name
 [ ] 0001_initial
 [ ] 0002_remove_resource_id
 [ ] 0003_alter_resource_object_id
 [ ] 0001_initial
 [ ] 0002_squashed_v300_release (18 squashed migrations)
 [ ] 0003_squashed_v300_v303_updates (9 squashed migrations)
 [ ] 0004_squashed_v310_release (6 squashed migrations)
 [ ] 0005_squashed_v310_v313_updates (3 squashed migrations)
 [ ] 0006_v320_release
 [ ] 0007_v320_data_migrations
 [ ] 0008_v320_drop_v1_credential_fields
 [ ] 0009_v322_add_setting_field_for_activity_stream
 [ ] 0010_v322_add_ovirt4_tower_inventory
 [ ] 0011_v322_encrypt_survey_passwords
 [ ] 0012_v322_update_cred_types
 [ ] 0013_v330_multi_credential
 [ ] 0014_v330_saved_launchtime_configs
 [ ] 0015_v330_blank_start_args
 [ ] 0016_v330_non_blank_workflow
 [ ] 0017_v330_move_deprecated_stdout
 [ ] 0018_v330_add_additional_stdout_events
 [ ] 0019_v330_custom_virtualenv
 [ ] 0020_v330_instancegroup_policies
 [ ] 0021_v330_declare_new_rbac_roles
 [ ] 0022_v330_create_new_rbac_roles
 [ ] 0023_v330_inventory_multicred
 [ ] 0024_v330_create_user_session_membership
 [ ] 0025_v330_add_oauth_activity_stream_registrar
 [ ] 0026_v330_delete_authtoken
 [ ] 0027_v330_emitted_events
 [ ] 0028_v330_add_tower_verify
 [ ] 0030_v330_modify_application
 [ ] 0031_v330_encrypt_oauth2_secret
 [ ] 0032_v330_polymorphic_delete
 [ ] 0033_v330_oauth_help_text
 [ ] 0034_v330_delete_user_role
 [ ] 0035_v330_more_oauth2_help_text
 [ ] 0036_v330_credtype_remove_become_methods
 [ ] 0037_v330_remove_legacy_fact_cleanup
 [ ] 0038_v330_add_deleted_activitystream_actor
 [ ] 0039_v330_custom_venv_help_text
 [ ] 0040_v330_unifiedjob_controller_node
 [ ] 0041_v330_update_oauth_refreshtoken
 [ ] 0042_v330_org_member_role_deparent
 [ ] 0043_v330_oauth2accesstoken_modified
 [ ] 0044_v330_add_inventory_update_inventory
 [ ] 0045_v330_instance_managed_by_policy
 [ ] 0046_v330_remove_client_credentials_grant
 [ ] 0047_v330_activitystream_instance
 [ ] 0048_v330_django_created_modified_by_model_name
 [ ] 0049_v330_validate_instance_capacity_adjustment
 [ ] 0050_v340_drop_celery_tables
 [ ] 0051_v340_job_slicing
 [ ] 0052_v340_remove_project_scm_delete_on_next_update
 [ ] 0053_v340_workflow_inventory
 [ ] 0054_v340_workflow_convergence
 [ ] 0055_v340_add_grafana_notification
 [ ] 0056_v350_custom_venv_history
 [ ] 0057_v350_remove_become_method_type
 [ ] 0058_v350_remove_limit_limit
 [ ] 0059_v350_remove_adhoc_limit
 [ ] 0060_v350_update_schedule_uniqueness_constraint
 [ ] 0061_v350_track_native_credentialtype_source
 [ ] 0062_v350_new_playbook_stats
 [ ] 0063_v350_org_host_limits
 [ ] 0064_v350_analytics_state
 [ ] 0065_v350_index_job_status
 [ ] 0066_v350_inventorysource_custom_virtualenv
 [ ] 0067_v350_credential_plugins
 [ ] 0068_v350_index_event_created
 [ ] 0069_v350_generate_unique_install_uuid
 [ ] 0070_v350_gce_instance_id
 [ ] 0071_v350_remove_system_tracking
 [ ] 0072_v350_deprecate_fields
 [ ] 0073_v360_create_instance_group_m2m
 [ ] 0074_v360_migrate_instance_group_relations
 [ ] 0075_v360_remove_old_instance_group_relations
 [ ] 0076_v360_add_new_instance_group_relations
 [ ] 0077_v360_add_default_orderings
 [ ] 0078_v360_clear_sessions_tokens_jt
 [ ] 0079_v360_rm_implicit_oauth2_apps
 [ ] 0080_v360_replace_job_origin
 [ ] 0081_v360_notify_on_start
 [ ] 0082_v360_webhook_http_method
 [ ] 0083_v360_job_branch_override
 [ ] 0084_v360_token_description
 [ ] 0085_v360_add_notificationtemplate_messages
 [ ] 0086_v360_workflow_approval
 [ ] 0087_v360_update_credential_injector_help_text
 [ ] 0088_v360_dashboard_optimizations
 [ ] 0089_v360_new_job_event_types
 [ ] 0090_v360_WFJT_prompts
 [ ] 0091_v360_approval_node_notifications
 [ ] 0092_v360_webhook_mixin
 [ ] 0093_v360_personal_access_tokens
 [ ] 0094_v360_webhook_mixin2
 [ ] 0095_v360_increase_instance_version_length
 [ ] 0096_v360_container_groups
 [ ] 0097_v360_workflowapproval_approved_or_denied_by
 [ ] 0098_v360_rename_cyberark_aim_credential_type
 [ ] 0099_v361_license_cleanup
 [ ] 0100_v370_projectupdate_job_tags
 [ ] 0101_v370_generate_new_uuids_for_iso_nodes
 [ ] 0102_v370_unifiedjob_canceled
 [ ] 0103_v370_remove_computed_fields
 [ ] 0104_v370_cleanup_old_scan_jts
 [ ] 0105_v370_remove_jobevent_parent_and_hosts
 [ ] 0106_v370_remove_inventory_groups_with_active_failures
 [ ] 0107_v370_workflow_convergence_api_toggle
 [ ] 0108_v370_unifiedjob_dependencies_processed
 [ ] 0109_v370_job_template_organization_field
 [ ] 0110_v370_instance_ip_address
 [ ] 0111_v370_delete_channelgroup
 [ ] 0112_v370_workflow_node_identifier
 [ ] 0113_v370_event_bigint
 [ ] 0114_v370_remove_deprecated_manual_inventory_sources
 [ ] 0115_v370_schedule_set_null
 [ ] 0116_v400_remove_hipchat_notifications
 [ ] 0117_v400_remove_cloudforms_inventory
 [ ] 0118_add_remote_archive_scm_type
 [ ] 0119_inventory_plugins
 [ ] 0120_galaxy_credentials
 [ ] 0121_delete_toweranalyticsstate
 [ ] 0122_really_remove_cloudforms_inventory
 [ ] 0123_drop_hg_support
 [ ] 0124_execution_environments
 [ ] 0125_more_ee_modeling_changes
 [ ] 0126_executionenvironment_container_options
 [ ] 0127_reset_pod_spec_override
 [ ] 0128_organiaztion_read_roles_ee_admin
 [ ] 0129_unifiedjob_installed_collections
 [ ] 0130_ee_polymorphic_set_null
 [ ] 0131_undo_org_polymorphic_ee
 [ ] 0132_instancegroup_is_container_group
 [ ] 0133_centrify_vault_credtype
 [ ] 0134_unifiedjob_ansible_version
 [ ] 0135_schedule_sort_fallback_to_id
 [ ] 0136_scm_track_submodules
 [ ] 0137_custom_inventory_scripts_removal_data
 [ ] 0138_custom_inventory_scripts_removal
 [ ] 0139_isolated_removal
 [ ] 0140_rename
 [ ] 0141_remove_isolated_instances
 [ ] 0142_update_ee_image_field_description
 [ ] 0143_hostmetric
 [ ] 0144_event_partitions
 [ ] 0145_deregister_managed_ee_objs
 [ ] 0146_add_insights_inventory
 [ ] 0147_validate_ee_image_field
 [ ] 0148_unifiedjob_receptor_unit_id
 [ ] 0149_remove_inventory_insights_credential
 [ ] 0150_rename_inv_sources_inv_updates
 [ ] 0151_rename_managed_by_tower
 [ ] 0152_instance_node_type
 [ ] 0153_instance_last_seen
 [ ] 0154_set_default_uuid
 [ ] 0155_improved_health_check
 [ ] 0156_capture_mesh_topology
 [ ] 0157_inventory_labels
 [ ] 0158_make_instance_cpu_decimal
 [ ] 0159_deprecate_inventory_source_UoPU_field
 [ ] 0160_alter_schedule_rrule
 [ ] 0161_unifiedjob_host_status_counts
 [ ] 0162_alter_unifiedjob_dependent_jobs
 [ ] 0163_convert_job_tags_to_textfield
 [ ] 0164_remove_inventorysource_update_on_project_update
 [ ] 0165_task_manager_refactor
 [ ] 0166_alter_jobevent_host
 [ ] 0167_project_signature_validation_credential
 [ ] 0168_inventoryupdate_scm_revision
 [ ] 0169_jt_prompt_everything_on_launch
 [ ] 0170_node_and_link_state
 [ ] 0171_add_health_check_started
 [ ] 0172_prevent_instance_fallback
 [ ] 0173_instancegroup_max_limits
 [ ] 0174_ensure_org_ee_admin_roles
 [ ] 0175_workflowjob_is_bulk_job
 [ ] 0176_inventorysource_scm_branch
 [ ] 0177_instance_group_role_addition
 [ ] 0178_instance_group_admin_migration
 [ ] 0179_change_cyberark_plugin_names
 [ ] 0180_add_hostmetric_fields
 [ ] 0181_hostmetricsummarymonthly
 [ ] 0182_constructed_inventory
 [ ] 0183_pre_django_upgrade
 [ ] 0184_django_indexes
 [ ] 0185_move_JSONBlob_to_JSONField
 [ ] 0186_drop_django_taggit
 [ ] 0187_hop_nodes
 [ ] 0188_add_bitbucket_dc_webhook
 [ ] 0189_inbound_hop_nodes
 [ ] 0190_alter_inventorysource_source_and_more
 [ ] 0001_initial
 [ ] 0002_auto_20190406_1805
 [ ] 0003_auto_20201211_1314
 [ ] 0004_auto_20200902_2022
 [ ] 0005_auto_20211222_2352
 [ ] 0001_initial
 [ ] 0001_initial
 [ ] 0002_alter_domain_unique
 [ ] 0001_initial (2 squashed migrations)
 [ ] 0002_add_related_name (2 squashed migrations)
 [ ] 0003_alter_email_max_length (2 squashed migrations)
 [ ] 0004_auto_20160423_0400 (2 squashed migrations)
 [ ] 0005_auto_20160727_2333 (1 squashed migrations)
 [ ] 0006_partial
 [ ] 0007_code_timestamp
 [ ] 0008_partial_timestamp
 [ ] 0009_auto_20191118_0520
 [ ] 0010_uid_db_index
 [ ] 0011_alter_id_fields
 [ ] 0012_usersocialauth_extra_data_new
 [ ] 0013_migrate_extra_data
 [ ] 0014_remove_usersocialauth_extra_data
 [ ] 0015_rename_extra_data_new_usersocialauth_extra_data
 [ ] 0001_initial
 [ ] 0002_expand_provider_options
 [ ] 0003_convert_saml_string_to_list
bash-5.1#

Workaround 1) Copy existing job

 ➜ k get job -n awx awx-migration-24.0.0 -oyaml > awx-job.yaml

2) Remove the whole status block at the bottom of yaml file and remove resourceVersion, uid (and any other uids inside the file)

3) Delete currently present job

 ➜ k delete job -n awx awx-migration-24.0.0

4) Apply previously saved and modified awx-job.yaml

A new job is getting created and runs, which should complete all the migrations this time

 ➜ k logs awx-migration-24.0.0-dqjnh -n awx -f                                                                                         (glean-changes|✚1…13βš‘1)
Operations to perform:
  Apply all migrations: auth, conf, contenttypes, dab_resource_registry, main, oauth2_provider, sessions, sites, social_django, sso
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying main.0001_initial... OK
  Applying main.0002_squashed_v300_release... OK
  Applying main.0003_squashed_v300_v303_updates... OK
  Applying main.0004_squashed_v310_release... OK
  Applying conf.0001_initial... OK
  Applying conf.0002_v310_copy_tower_settings... OK
  Applying main.0005_squashed_v310_v313_updates... OK
  Applying main.0006_v320_release... OK
  Applying main.0007_v320_data_migrations... OK
  Applying main.0008_v320_drop_v1_credential_fields... OK
  Applying main.0009_v322_add_setting_field_for_activity_stream... OK
  Applying main.0010_v322_add_ovirt4_tower_inventory... OK
  Applying main.0011_v322_encrypt_survey_passwords... OK
  Applying main.0012_v322_update_cred_types... OK
  Applying main.0013_v330_multi_credential... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying conf.0003_v310_JSONField_changes... OK
  Applying conf.0004_v320_reencrypt... OK
  Applying conf.0005_v330_rename_two_session_settings... OK
  Applying conf.0006_v331_ldap_group_type... OK
  Applying conf.0007_v380_rename_more_settings... OK
  Applying conf.0008_subscriptions... OK
  Applying conf.0009_rename_proot_settings... OK
  Applying conf.0010_change_to_JSONField... OK
  Applying dab_resource_registry.0001_initial... OK
  Applying dab_resource_registry.0002_remove_resource_id... OK
  Applying dab_resource_registry.0003_alter_resource_object_id... OK
  Applying sessions.0001_initial... OK
  Applying main.0014_v330_saved_launchtime_configs... OK
  Applying main.0015_v330_blank_start_args... OK
  Applying main.0016_v330_non_blank_workflow... OK
  Applying main.0017_v330_move_deprecated_stdout... OK
  Applying main.0018_v330_add_additional_stdout_events... OK
  Applying main.0019_v330_custom_virtualenv... OK
  Applying main.0020_v330_instancegroup_policies... OK
  Applying main.0021_v330_declare_new_rbac_roles... OK
  Applying main.0022_v330_create_new_rbac_roles... OK
  Applying main.0023_v330_inventory_multicred... OK
  Applying main.0024_v330_create_user_session_membership... OK
  Applying main.0025_v330_add_oauth_activity_stream_registrar... OK
  Applying oauth2_provider.0001_initial... OK
  Applying oauth2_provider.0002_auto_20190406_1805... OK
  Applying oauth2_provider.0003_auto_20201211_1314... OK
  Applying oauth2_provider.0004_auto_20200902_2022... OK
  Applying oauth2_provider.0005_auto_20211222_2352... OK
  Applying main.0026_v330_delete_authtoken... OK
  Applying main.0027_v330_emitted_events... OK
  Applying main.0028_v330_add_tower_verify... OK
  Applying main.0030_v330_modify_application... OK
  Applying main.0031_v330_encrypt_oauth2_secret... OK
  Applying main.0032_v330_polymorphic_delete... OK
  Applying main.0033_v330_oauth_help_text... OK
2024-03-15 10:59:21,368 INFO     [-] rbac_migrations Computing role roots..
2024-03-15 10:59:21,371 INFO     [-] rbac_migrations Found 0 roots in 0.000316 seconds, rebuilding ancestry map
2024-03-15 10:59:21,372 INFO     [-] rbac_migrations Rebuild ancestors completed in 0.000009 seconds
2024-03-15 10:59:21,372 INFO     [-] rbac_migrations Done.
  Applying main.0034_v330_delete_user_role... OK
  Applying main.0035_v330_more_oauth2_help_text... OK
  Applying main.0036_v330_credtype_remove_become_methods... OK
  Applying main.0037_v330_remove_legacy_fact_cleanup... OK
  Applying main.0038_v330_add_deleted_activitystream_actor... OK
  Applying main.0039_v330_custom_venv_help_text... OK
  Applying main.0040_v330_unifiedjob_controller_node... OK
  Applying main.0041_v330_update_oauth_refreshtoken... OK
2024-03-15 10:59:24,490 INFO     [-] rbac_migrations Computing role roots..
2024-03-15 10:59:24,494 INFO     [-] rbac_migrations Found 0 roots in 0.000387 seconds, rebuilding ancestry map
2024-03-15 10:59:24,494 INFO     [-] rbac_migrations Rebuild ancestors completed in 0.000009 seconds
2024-03-15 10:59:24,495 INFO     [-] rbac_migrations Done.
  Applying main.0042_v330_org_member_role_deparent... OK
  Applying main.0043_v330_oauth2accesstoken_modified... OK
  Applying main.0044_v330_add_inventory_update_inventory... OK
  Applying main.0045_v330_instance_managed_by_policy... OK
  Applying main.0046_v330_remove_client_credentials_grant... OK
  Applying main.0047_v330_activitystream_instance... OK
  Applying main.0048_v330_django_created_modified_by_model_name... OK
  Applying main.0049_v330_validate_instance_capacity_adjustment... OK
  Applying main.0050_v340_drop_celery_tables... OK
  Applying main.0051_v340_job_slicing... OK
  Applying main.0052_v340_remove_project_scm_delete_on_next_update... OK
  Applying main.0053_v340_workflow_inventory... OK
  Applying main.0054_v340_workflow_convergence... OK
  Applying main.0055_v340_add_grafana_notification... OK
  Applying main.0056_v350_custom_venv_history... OK
  Applying main.0057_v350_remove_become_method_type... OK
  Applying main.0058_v350_remove_limit_limit... OK
  Applying main.0059_v350_remove_adhoc_limit... OK
  Applying main.0060_v350_update_schedule_uniqueness_constraint... OK
  Applying main.0061_v350_track_native_credentialtype_source... OK
  Applying main.0062_v350_new_playbook_stats... OK
  Applying main.0063_v350_org_host_limits... OK
  Applying main.0064_v350_analytics_state... OK
  Applying main.0065_v350_index_job_status... OK
  Applying main.0066_v350_inventorysource_custom_virtualenv... OK
  Applying main.0067_v350_credential_plugins... OK
  Applying main.0068_v350_index_event_created... OK
  Applying main.0069_v350_generate_unique_install_uuid... OK
  Applying main.0070_v350_gce_instance_id... OK
  Applying main.0071_v350_remove_system_tracking... OK
  Applying main.0072_v350_deprecate_fields... OK
  Applying main.0073_v360_create_instance_group_m2m... OK
  Applying main.0074_v360_migrate_instance_group_relations... OK
  Applying main.0075_v360_remove_old_instance_group_relations... OK
  Applying main.0076_v360_add_new_instance_group_relations... OK
  Applying main.0077_v360_add_default_orderings... OK
  Applying main.0078_v360_clear_sessions_tokens_jt... OK
  Applying main.0079_v360_rm_implicit_oauth2_apps... OK
  Applying main.0080_v360_replace_job_origin... OK
  Applying main.0081_v360_notify_on_start... OK
  Applying main.0082_v360_webhook_http_method... OK
  Applying main.0083_v360_job_branch_override... OK
  Applying main.0084_v360_token_description... OK
  Applying main.0085_v360_add_notificationtemplate_messages... OK
  Applying main.0086_v360_workflow_approval... OK
  Applying main.0087_v360_update_credential_injector_help_text... OK
  Applying main.0088_v360_dashboard_optimizations... OK
  Applying main.0089_v360_new_job_event_types... OK
  Applying main.0090_v360_WFJT_prompts... OK
  Applying main.0091_v360_approval_node_notifications... OK
  Applying main.0092_v360_webhook_mixin... OK
  Applying main.0093_v360_personal_access_tokens... OK
  Applying main.0094_v360_webhook_mixin2... OK
  Applying main.0095_v360_increase_instance_version_length... OK
  Applying main.0096_v360_container_groups... OK
  Applying main.0097_v360_workflowapproval_approved_or_denied_by... OK
  Applying main.0098_v360_rename_cyberark_aim_credential_type... OK
  Applying main.0099_v361_license_cleanup... OK
  Applying main.0100_v370_projectupdate_job_tags... OK
  Applying main.0101_v370_generate_new_uuids_for_iso_nodes... OK
  Applying main.0102_v370_unifiedjob_canceled... OK
  Applying main.0103_v370_remove_computed_fields... OK
  Applying main.0104_v370_cleanup_old_scan_jts... OK
  Applying main.0105_v370_remove_jobevent_parent_and_hosts... OK
  Applying main.0106_v370_remove_inventory_groups_with_active_failures... OK
  Applying main.0107_v370_workflow_convergence_api_toggle... OK
  Applying main.0108_v370_unifiedjob_dependencies_processed... OK
2024-03-15 11:00:23,314 INFO     [-] rbac_migrations Unified organization migration completed in 0.0444 seconds
2024-03-15 11:00:23,366 INFO     [-] rbac_migrations Unified organization migration completed in 0.0517 seconds
2024-03-15 11:00:25,717 INFO     [-] rbac_migrations Rebuild parentage completed in 0.007353 seconds
  Applying main.0109_v370_job_template_organization_field... OK
  Applying main.0110_v370_instance_ip_address... OK
  Applying main.0111_v370_delete_channelgroup... OK
  Applying main.0112_v370_workflow_node_identifier... OK
  Applying main.0113_v370_event_bigint... OK
  Applying main.0114_v370_remove_deprecated_manual_inventory_sources... OK
  Applying main.0115_v370_schedule_set_null... OK
  Applying main.0116_v400_remove_hipchat_notifications... OK
  Applying main.0117_v400_remove_cloudforms_inventory... OK
  Applying main.0118_add_remote_archive_scm_type... OK
  Applying main.0119_inventory_plugins... OK
  Applying main.0120_galaxy_credentials... OK
  Applying main.0121_delete_toweranalyticsstate... OK
  Applying main.0122_really_remove_cloudforms_inventory... OK
  Applying main.0123_drop_hg_support... OK
  Applying main.0124_execution_environments... OK
  Applying main.0125_more_ee_modeling_changes... OK
  Applying main.0126_executionenvironment_container_options... OK
  Applying main.0127_reset_pod_spec_override... OK
  Applying main.0128_organiaztion_read_roles_ee_admin... OK
  Applying main.0129_unifiedjob_installed_collections... OK
  Applying main.0130_ee_polymorphic_set_null... OK
  Applying main.0131_undo_org_polymorphic_ee... OK
  Applying main.0132_instancegroup_is_container_group... OK
  Applying main.0133_centrify_vault_credtype... OK
  Applying main.0134_unifiedjob_ansible_version... OK
  Applying main.0135_schedule_sort_fallback_to_id... OK
  Applying main.0136_scm_track_submodules... OK
  Applying main.0137_custom_inventory_scripts_removal_data... OK
  Applying main.0138_custom_inventory_scripts_removal... OK
  Applying main.0139_isolated_removal... OK
  Applying main.0140_rename... OK
  Applying main.0141_remove_isolated_instances... OK
  Applying main.0142_update_ee_image_field_description... OK
  Applying main.0143_hostmetric... OK
  Applying main.0144_event_partitions... OK
  Applying main.0145_deregister_managed_ee_objs... OK
  Applying main.0146_add_insights_inventory... OK
  Applying main.0147_validate_ee_image_field... OK
  Applying main.0148_unifiedjob_receptor_unit_id... OK
  Applying main.0149_remove_inventory_insights_credential... OK
  Applying main.0150_rename_inv_sources_inv_updates... OK
  Applying main.0151_rename_managed_by_tower... OK
  Applying main.0152_instance_node_type... OK
  Applying main.0153_instance_last_seen... OK
  Applying main.0154_set_default_uuid... OK
  Applying main.0155_improved_health_check... OK
  Applying main.0156_capture_mesh_topology... OK
  Applying main.0157_inventory_labels... OK
  Applying main.0158_make_instance_cpu_decimal... OK
  Applying main.0159_deprecate_inventory_source_UoPU_field... OK
  Applying main.0160_alter_schedule_rrule... OK
  Applying main.0161_unifiedjob_host_status_counts... OK
  Applying main.0162_alter_unifiedjob_dependent_jobs... OK
  Applying main.0163_convert_job_tags_to_textfield... OK
  Applying main.0164_remove_inventorysource_update_on_project_update... OK
  Applying main.0165_task_manager_refactor... OK
  Applying main.0166_alter_jobevent_host... OK
  Applying main.0167_project_signature_validation_credential... OK
  Applying main.0168_inventoryupdate_scm_revision... OK
  Applying main.0169_jt_prompt_everything_on_launch... OK
  Applying main.0170_node_and_link_state... OK
  Applying main.0171_add_health_check_started... OK
  Applying main.0172_prevent_instance_fallback... OK
  Applying main.0173_instancegroup_max_limits... OK
  Applying main.0174_ensure_org_ee_admin_roles... OK
  Applying main.0175_workflowjob_is_bulk_job... OK
  Applying main.0176_inventorysource_scm_branch... OK
  Applying main.0177_instance_group_role_addition... OK
2024-03-15 11:01:30,996 INFO     [-] awx.main.migrations Initiated migration from Org admin to use role
  Applying main.0178_instance_group_admin_migration... OK
  Applying main.0179_change_cyberark_plugin_names... OK
  Applying main.0180_add_hostmetric_fields... OK
  Applying main.0181_hostmetricsummarymonthly... OK
  Applying main.0182_constructed_inventory... OK
  Applying main.0183_pre_django_upgrade... OK
  Applying main.0184_django_indexes... OK
  Applying main.0185_move_JSONBlob_to_JSONField... OK
  Applying main.0186_drop_django_taggit... OK
  Applying main.0187_hop_nodes... OK
  Applying main.0188_add_bitbucket_dc_webhook... OK
  Applying main.0189_inbound_hop_nodes... OK
  Applying main.0190_alter_inventorysource_source_and_more... OK
  Applying sites.0001_initial... OK
  Applying sites.0002_alter_domain_unique... OK
  Applying social_django.0001_initial... OK
  Applying social_django.0002_add_related_name... OK
  Applying social_django.0003_alter_email_max_length... OK
  Applying social_django.0004_auto_20160423_0400... OK
  Applying social_django.0005_auto_20160727_2333... OK
  Applying social_django.0006_partial... OK
  Applying social_django.0007_code_timestamp... OK
  Applying social_django.0008_partial_timestamp... OK
  Applying social_django.0009_auto_20191118_0520... OK
  Applying social_django.0010_uid_db_index... OK
  Applying social_django.0011_alter_id_fields... OK
  Applying social_django.0012_usersocialauth_extra_data_new... OK
  Applying social_django.0013_migrate_extra_data... OK
  Applying social_django.0014_remove_usersocialauth_extra_data... OK
  Applying social_django.0015_rename_extra_data_new_usersocialauth_extra_data... OK
  Applying sso.0001_initial... OK
  Applying sso.0002_expand_provider_options... OK
  Applying sso.0003_convert_saml_string_to_list... OK

Operator Logs

No response

craph commented 6 months ago

This issue looks similar to https://github.com/ansible/awx-operator/issues/1775 and https://github.com/ansible/awx-operator/issues/1770

fosterseth commented 6 months ago

can you test my patch here? https://github.com/ansible/awx-operator/issues/1770#issuecomment-2000166172 thanks!

dark-vex commented 6 months ago

@fosterseth thanks, I'll try in another environment. On this one, I've ended up in reinstall+restore data from a backup

LukWe99 commented 6 months ago

I encountered a similar problem: In my case the awx-task pods were stuck in init-container "init-database" with "waiting for migrations". Therefore I checked the logs of the awx-operator and I found the following error:

TASK [Verify the resource pod name is populated.] ******************************** fatal: [localhost]: FAILED! => { "assertion": "awx_web_pod_name != ''", "changed": false, "evaluated_to": false, "msg": "Could not find the tower pod's name." }

After checking the awx-operator source code, I think, that removing the "wait" and "wait_timeout" from the task, where the web and task deployments are applied ("Apply deployment resources" in resources_configuration.yml), may cause the problem (Commit ffba1b4, Pull Request #1674).

The deployments are applied without waiting for them to be running. In the immediately following task "Get the new resource pod information after updating resource" the playbook tries to get the infos from the web pods but only with "status.phase=Running". As the previous task is not waiting for the pods created by the deployments to be running, the registered _new_pod variable may be empty at this moment. Therefore all the following set_fact tasks may use empty values and therefore the assertion task "Verify the resource pod name is populated" is failing. The playbook then ends at this point and all the following includes like "migrate_schema.yml", "initialize_django.yml" etc. are not executed.