Closed JG127 closed 2 years ago
I can confirm this is happening. I just did a clean install as well. Same problem.
Yep, I'm able to reproduce this. Looking into it.
Actually, I can't reproduce this; as the error message above suggested, it was just migrating (which took a minute).
Here's my asciinema. This is on a cleanly installed Debian 10 host with docker and ansible.
https://asciinema.org/a/b1jgaeSFWiv6jHkFmPpO8aLTI?t=13
This is the vars.yml:
postgres_data_dir: "/srv/pgdocker"
docker_compose_dir: "/srv/awxcompose"
pg_password: "pgpass"
admin_password: "adminpass"
secret_key: "secretkey"
project_data_dir: "/srv/awx/projects"
I'm trying to setup AWX using docker-compose, I'm having the same problems as OP, resulting in an infinite loop (30 minutes so far) of Ansible trying to perform the migrations. I will test again from scratch and report as soon as possible.
It never finishes the migrations on my hosts, at least, not for an hour. I still have it running so I can have a look again tomorrow ;-)
Do you see any errors related to migrations? What happens if you exec into the web container and run:
awx-manage migrate
by hand?
Maybe unrelated to this issue, but release 11.1.0 has the same errors. After about 15' error messages it seems to resume its proper routine.
$ docker-compose exec web bash
bash-4.4# awx-manage migrate
Operations to perform:
Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
No migrations to apply.
There must be a difference somewhere. The Python runtime environment perhaps ?
Hmmm, I get different output than @JG127.
root@awx-test:~# docker exec -ti 261e78c819ad bash
bash-4.4# awx-manage migrate
Operations to perform:
Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
Applying main.0001_initial... OK
Applying main.0002_squashed_v300_release... OK
Applying main.0003_squashed_v300_v303_updates... OK
Applying main.0004_squashed_v310_release... OK
Applying conf.0001_initial... OK
Applying conf.0002_v310_copy_tower_settings... OK
Applying main.0005_squashed_v310_v313_updates... OK
Applying main.0006_v320_release... OK
Applying main.0007_v320_data_migrations... OK
Applying main.0008_v320_drop_v1_credential_fields... OK
Applying main.0009_v322_add_setting_field_for_activity_stream... OK
Applying main.0010_v322_add_ovirt4_tower_inventory... OK
Applying main.0011_v322_encrypt_survey_passwords... OK
Applying main.0012_v322_update_cred_types... OK
Applying main.0013_v330_multi_credential... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying conf.0003_v310_JSONField_changes... OK
Applying conf.0004_v320_reencrypt... OK
Applying conf.0005_v330_rename_two_session_settings... OK
Applying conf.0006_v331_ldap_group_type... OK
Applying sessions.0001_initial... OK
Applying main.0014_v330_saved_launchtime_configs... OK
Applying main.0015_v330_blank_start_args... OK
Applying main.0016_v330_non_blank_workflow... OK
Applying main.0017_v330_move_deprecated_stdout... OK
Applying main.0018_v330_add_additional_stdout_events... OK
Applying main.0019_v330_custom_virtualenv... OK
Applying main.0020_v330_instancegroup_policies... OK
Applying main.0021_v330_declare_new_rbac_roles... OK
Applying main.0022_v330_create_new_rbac_roles... OK
Applying main.0023_v330_inventory_multicred... OK
Applying main.0024_v330_create_user_session_membership... OK
Applying main.0025_v330_add_oauth_activity_stream_registrar... OK
Applying oauth2_provider.0001_initial... OK
Applying main.0026_v330_delete_authtoken... OK
Applying main.0027_v330_emitted_events... OK
Applying main.0028_v330_add_tower_verify... OK
Applying main.0030_v330_modify_application... OK
Applying main.0031_v330_encrypt_oauth2_secret... OK
Applying main.0032_v330_polymorphic_delete... OK
Applying main.0033_v330_oauth_help_text... OK
2020-04-23 08:00:25,638 INFO rbac_migrations Computing role roots..
2020-04-23 08:00:25,640 INFO rbac_migrations Found 0 roots in 0.000213 seconds, rebuilding ancestry map
2020-04-23 08:00:25,640 INFO rbac_migrations Rebuild ancestors completed in 0.000008 seconds
2020-04-23 08:00:25,640 INFO rbac_migrations Done.
Applying main.0034_v330_delete_user_role... OK
Applying main.0035_v330_more_oauth2_help_text... OK
Applying main.0036_v330_credtype_remove_become_methods... OK
Applying main.0037_v330_remove_legacy_fact_cleanup... OK
Applying main.0038_v330_add_deleted_activitystream_actor... OK
Applying main.0039_v330_custom_venv_help_text... OK
Applying main.0040_v330_unifiedjob_controller_node... OK
Applying main.0041_v330_update_oauth_refreshtoken... OK
2020-04-23 08:00:29,220 INFO rbac_migrations Computing role roots..
2020-04-23 08:00:29,225 INFO rbac_migrations Found 0 roots in 0.000184 seconds, rebuilding ancestry map
2020-04-23 08:00:29,225 INFO rbac_migrations Rebuild ancestors completed in 0.000010 seconds
2020-04-23 08:00:29,225 INFO rbac_migrations Done.
Applying main.0042_v330_org_member_role_deparent... OK
Applying main.0043_v330_oauth2accesstoken_modified... OK
Applying main.0044_v330_add_inventory_update_inventory... OK
Applying main.0045_v330_instance_managed_by_policy... OK
Applying main.0046_v330_remove_client_credentials_grant... OK
Applying main.0047_v330_activitystream_instance... OK
Applying main.0048_v330_django_created_modified_by_model_name... OK
Applying main.0049_v330_validate_instance_capacity_adjustment... OK
Applying main.0050_v340_drop_celery_tables... OK
Applying main.0051_v340_job_slicing... OK
Applying main.0052_v340_remove_project_scm_delete_on_next_update... OK
Applying main.0053_v340_workflow_inventory... OK
Applying main.0054_v340_workflow_convergence... OK
Applying main.0055_v340_add_grafana_notification... OK
Applying main.0056_v350_custom_venv_history... OK
Applying main.0057_v350_remove_become_method_type... OK
Applying main.0058_v350_remove_limit_limit... OK
Applying main.0059_v350_remove_adhoc_limit... OK
Applying main.0060_v350_update_schedule_uniqueness_constraint... OK
2020-04-23 08:00:44,638 DEBUG awx.main.models.credential adding Machine credential type
2020-04-23 08:00:44,660 DEBUG awx.main.models.credential adding Source Control credential type
2020-04-23 08:00:44,673 DEBUG awx.main.models.credential adding Vault credential type
2020-04-23 08:00:44,683 DEBUG awx.main.models.credential adding Network credential type
2020-04-23 08:00:44,692 DEBUG awx.main.models.credential adding Amazon Web Services credential type
2020-04-23 08:00:44,702 DEBUG awx.main.models.credential adding OpenStack credential type
2020-04-23 08:00:44,713 DEBUG awx.main.models.credential adding VMware vCenter credential type
2020-04-23 08:00:44,723 DEBUG awx.main.models.credential adding Red Hat Satellite 6 credential type
2020-04-23 08:00:44,733 DEBUG awx.main.models.credential adding Red Hat CloudForms credential type
2020-04-23 08:00:44,743 DEBUG awx.main.models.credential adding Google Compute Engine credential type
2020-04-23 08:00:44,753 DEBUG awx.main.models.credential adding Microsoft Azure Resource Manager credential type
2020-04-23 08:00:44,763 DEBUG awx.main.models.credential adding GitHub Personal Access Token credential type
2020-04-23 08:00:44,773 DEBUG awx.main.models.credential adding GitLab Personal Access Token credential type
2020-04-23 08:00:44,784 DEBUG awx.main.models.credential adding Insights credential type
2020-04-23 08:00:44,794 DEBUG awx.main.models.credential adding Red Hat Virtualization credential type
2020-04-23 08:00:44,804 DEBUG awx.main.models.credential adding Ansible Tower credential type
2020-04-23 08:00:44,814 DEBUG awx.main.models.credential adding OpenShift or Kubernetes API Bearer Token credential type
2020-04-23 08:00:44,823 DEBUG awx.main.models.credential adding CyberArk AIM Central Credential Provider Lookup credential type
2020-04-23 08:00:44,833 DEBUG awx.main.models.credential adding Microsoft Azure Key Vault credential type
2020-04-23 08:00:44,843 DEBUG awx.main.models.credential adding CyberArk Conjur Secret Lookup credential type
2020-04-23 08:00:44,854 DEBUG awx.main.models.credential adding HashiCorp Vault Secret Lookup credential type
2020-04-23 08:00:44,864 DEBUG awx.main.models.credential adding HashiCorp Vault Signed SSH credential type
Applying main.0061_v350_track_native_credentialtype_source... OK
Applying main.0062_v350_new_playbook_stats... OK
Applying main.0063_v350_org_host_limits... OK
Applying main.0064_v350_analytics_state... OK
Applying main.0065_v350_index_job_status... OK
Applying main.0066_v350_inventorysource_custom_virtualenv... OK
Applying main.0067_v350_credential_plugins... OK
Applying main.0068_v350_index_event_created... OK
Applying main.0069_v350_generate_unique_install_uuid... OK
2020-04-23 08:00:48,324 DEBUG awx.main.migrations Migrating inventory instance_id for gce to gce_id
Applying main.0070_v350_gce_instance_id... OK
Applying main.0071_v350_remove_system_tracking... OK
Applying main.0072_v350_deprecate_fields... OK
Applying main.0073_v360_create_instance_group_m2m... OK
Applying main.0074_v360_migrate_instance_group_relations... OK
Applying main.0075_v360_remove_old_instance_group_relations... OK
Applying main.0076_v360_add_new_instance_group_relations... OK
Applying main.0077_v360_add_default_orderings... OK
Applying main.0078_v360_clear_sessions_tokens_jt... OK
Applying main.0079_v360_rm_implicit_oauth2_apps... OK
Applying main.0080_v360_replace_job_origin... OK
Applying main.0081_v360_notify_on_start... OK
Applying main.0082_v360_webhook_http_method... OK
Applying main.0083_v360_job_branch_override... OK
Applying main.0084_v360_token_description... OK
Applying main.0085_v360_add_notificationtemplate_messages... OK
Applying main.0086_v360_workflow_approval... OK
Applying main.0087_v360_update_credential_injector_help_text... OK
Applying main.0088_v360_dashboard_optimizations... OK
Applying main.0089_v360_new_job_event_types... OK
Applying main.0090_v360_WFJT_prompts... OK
Applying main.0091_v360_approval_node_notifications... OK
Applying main.0092_v360_webhook_mixin... OK
Applying main.0093_v360_personal_access_tokens... OK
Applying main.0094_v360_webhook_mixin2... OK
Applying main.0095_v360_increase_instance_version_length... OK
Applying main.0096_v360_container_groups... OK
Applying main.0097_v360_workflowapproval_approved_or_denied_by... OK
Applying main.0098_v360_rename_cyberark_aim_credential_type... OK
Applying main.0099_v361_license_cleanup... OK
Applying main.0100_v370_projectupdate_job_tags... OK
Applying main.0101_v370_generate_new_uuids_for_iso_nodes... OK
Applying main.0102_v370_unifiedjob_canceled... OK
Applying main.0103_v370_remove_computed_fields... OK
Applying main.0104_v370_cleanup_old_scan_jts... OK
Applying main.0105_v370_remove_jobevent_parent_and_hosts... OK
Applying main.0106_v370_remove_inventory_groups_with_active_failures... OK
Applying main.0107_v370_workflow_convergence_api_toggle... OK
Applying main.0108_v370_unifiedjob_dependencies_processed... OK
2020-04-23 08:01:26,793 DEBUG rbac_migrations Migrating inventorysource to new organization field
2020-04-23 08:01:26,808 DEBUG rbac_migrations Migrating jobtemplate to new organization field
2020-04-23 08:01:26,816 DEBUG rbac_migrations Migrating project to new organization field
2020-04-23 08:01:26,822 DEBUG rbac_migrations Migrating systemjobtemplate to new organization field
2020-04-23 08:01:26,822 DEBUG rbac_migrations Class systemjobtemplate has no organization migration
2020-04-23 08:01:26,822 DEBUG rbac_migrations Migrating workflowjobtemplate to new organization field
2020-04-23 08:01:26,829 DEBUG rbac_migrations Migrating workflowapprovaltemplate to new organization field
2020-04-23 08:01:26,829 DEBUG rbac_migrations Class workflowapprovaltemplate has no organization migration
2020-04-23 08:01:26,830 INFO rbac_migrations Unified organization migration completed in 0.0366 seconds
2020-04-23 08:01:26,830 DEBUG rbac_migrations Migrating adhoccommand to new organization field
2020-04-23 08:01:26,838 DEBUG rbac_migrations Migrating inventoryupdate to new organization field
2020-04-23 08:01:26,846 DEBUG rbac_migrations Migrating job to new organization field
2020-04-23 08:01:26,853 DEBUG rbac_migrations Migrating projectupdate to new organization field
2020-04-23 08:01:26,861 DEBUG rbac_migrations Migrating systemjob to new organization field
2020-04-23 08:01:26,861 DEBUG rbac_migrations Class systemjob has no organization migration
2020-04-23 08:01:26,861 DEBUG rbac_migrations Migrating workflowjob to new organization field
2020-04-23 08:01:26,869 DEBUG rbac_migrations Migrating workflowapproval to new organization field
2020-04-23 08:01:26,869 DEBUG rbac_migrations Class workflowapproval has no organization migration
2020-04-23 08:01:26,869 INFO rbac_migrations Unified organization migration completed in 0.0391 seconds
2020-04-23 08:01:29,831 DEBUG rbac_migrations No changes to role parents for 0 resources
2020-04-23 08:01:29,831 DEBUG rbac_migrations Added parents to 0 roles
2020-04-23 08:01:29,831 DEBUG rbac_migrations Removed parents from 0 roles
2020-04-23 08:01:29,832 INFO rbac_migrations Rebuild parentage completed in 0.004574 seconds
Applying main.0109_v370_job_template_organization_field... OK
Applying main.0110_v370_instance_ip_address... OK
Applying main.0111_v370_delete_channelgroup... OK
Applying main.0112_v370_workflow_node_identifier... OK
Applying main.0113_v370_event_bigint... OK
Applying main.0114_v370_remove_deprecated_manual_inventory_sources... OK
Applying oauth2_provider.0002_08_updates... OK
Applying oauth2_provider.0003_auto_20160316_1503... OK
Applying oauth2_provider.0004_auto_20160525_1623... OK
Applying oauth2_provider.0005_auto_20170514_1141... OK
Applying oauth2_provider.0006_auto_20171214_2232... OK
Applying sites.0001_initial... OK
Applying sites.0002_alter_domain_unique... OK
Applying social_django.0001_initial... OK
Applying social_django.0002_add_related_name... OK
Applying social_django.0003_alter_email_max_length... OK
Applying social_django.0004_auto_20160423_0400... OK
Applying social_django.0005_auto_20160727_2333... OK
Applying social_django.0006_partial... OK
Applying social_django.0007_code_timestamp... OK
Applying social_django.0008_partial_timestamp... OK
Applying sso.0001_initial... OK
Applying sso.0002_expand_provider_options... OK
Applying taggit.0003_taggeditem_add_unique_index... OK
bash-4.4#
After this I do get the login prompt, but somehow I cannot log in.
After the migrations I still get the crashing dispatchter:
2020-04-23 08:19:14,009 INFO spawned: 'dispatcher' with pid 25200
2020-04-23 08:19:15,011 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-04-23 08:19:17,893 WARNING awx.main.dispatch.periodic periodic beat started
Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in <module>
sys.exit(manage())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
execute_from_command_line(sys.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle
reaper.reap()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
(changed, me) = Instance.objects.get_or_register()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 144, in get_or_register
return (False, self.me())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 116, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id
2020-04-23 08:19:18,418 INFO exited: dispatcher (exit status 1; not expected)
Adding the logs of the installer and the logs of the very first "docker-compose up" command.
initial_start.log.tar.gz ansible_install.log
I've got the impression the web service is waking up too early. It logs all sorts of errors for as long as the task service hasn't finished with the migrations.
Or rather, the task service wakes up late for the migrations. It's only at the end of the log file it begins to actually do something.
The net result however is that the system is functional. Albeit it took its time to get to that point. Maybe a network timeout somewhere ?
I agree, @JG127, this does sounds like sort of timing issue/race on startup. I've still been unable to reproduce, so if any of you find any additional clues, please let me know and I'm glad to help dig.
The only thing coming to mind is the Python environment used to run the installer and the application. I always use a Virtualenv environment to work in when doing Python projects. Otherwise I'll end up using the libraries of the OS installed software.
This is the elaborate description of what I do to set up the runtime environment:
# Make certain no third-party configuration impacts the process
$ mv ~/.local ~/.local_disabled
$ sudo mv /etc/ansible/ansible.cfg /etc/ansible/ansible.cfg_disabled
$ sudo mv ~/.awx ~/.awx_disabled
# Clean Docker completely
$ docker stop $(docker ps -q)
$ docker rm $(docker ps -qa)
$ docker rmi -f $(docker image ls -q)
$ docker system prune -f
$ docker builder prune -f
$ docker volume prune -f
# Create the runtime environment
$ virtualenv -p python2 venv
Running virtualenv with interpreter ~/.pyenv/shims/python2
Already using interpreter /usr/bin/python2
New python executable in ~/Projects/awx/venv/bin/python2
Also creating executable in ~/Projects/awx/venv/bin/python
Installing setuptools, pip, wheel...
done.
$ source venv/bin/activate
(venv) $ pip install ansible docker-compose
...
...
...
(venv) $ pip freeze
ansible==2.9.7
attrs==19.3.0
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.7.0.1
bcrypt==3.1.7
cached-property==1.5.1
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
configparser==4.0.2
contextlib2==0.6.0.post1
cryptography==2.9.2
docker==4.2.0
docker-compose==1.25.5
dockerpty==0.4.1
docopt==0.6.2
enum34==1.1.10
functools32==3.2.3.post2
idna==2.9
importlib-metadata==1.6.0
ipaddress==1.0.23
Jinja2==2.11.2
jsonschema==3.2.0
MarkupSafe==1.1.1
paramiko==2.7.1
pathlib2==2.3.5
pycparser==2.20
PyNaCl==1.3.0
pyrsistent==0.16.0
PyYAML==5.3.1
requests==2.23.0
scandir==1.10.0
six==1.14.0
subprocess32==3.5.4
texttable==1.6.2
urllib3==1.25.9
websocket-client==0.57.0
zipp==1.2.0
(venv) $ python --version
Python 2.7.17
(venv) $ ansible --version
ansible 2.9.7
config file = None
configured module search path = [u'~/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = ~/Projects/awx/venv/local/lib/python2.7/site-packages/ansible
executable location = ~/Projects/awx/venv/bin/ansible
python version = 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0]
(venv) $ docker-compose --version
docker-compose version 1.25.5, build unknown
(venv) $ docker --version
Docker version 19.03.8, build afacb8b7f0
(venv) $ docker system info
Client:
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 19.03.8
Storage Driver: overlay2
Backing Filesystem: <unknown>
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.3.0-46-generic
Operating System: Linux Mint 19.3
OSType: linux
Architecture: x86_64
CPUs: 7
Total Memory: 7.773GiB
Name: workvm
ID: DGRT:4RDB:6YC2:QTEB:U3IL:HDDQ:VCIT:HSUW:L344:KORB:SAPZ:MXIB
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
(venv) $ cd installer
(venv) $ ansible-playbook -i inventory install.yml
PLAY [Build and deploy AWX] ******************************************************************************************************************************************
TASK [Gathering Facts] ***********************************************************************************************************************************************
ok: [localhost]
TASK [check_vars : include_tasks] ************************************************************************************************************************************
skipping: [localhost]
TASK [check_vars : include_tasks] ************************************************************************************************************************************
included: /home/jan/Projects/awx/installer/roles/check_vars/tasks/check_docker.yml for localhost
TASK [check_vars : postgres_data_dir should be defined] **************************************************************************************************************
ok: [localhost] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [check_vars : host_port should be defined] **********************************************************************************************************************
ok: [localhost] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [image_build : Set global version if not provided] **************************************************************************************************************
skipping: [localhost]
TASK [image_build : Verify awx-logos directory exists for official install] ******************************************************************************************
skipping: [localhost]
TASK [image_build : Copy logos for inclusion in sdist] ***************************************************************************************************************
skipping: [localhost]
TASK [image_build : Set sdist file name] *****************************************************************************************************************************
skipping: [localhost]
TASK [image_build : AWX Distribution] ********************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stat distribution file] **************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Clean distribution] ******************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Build sdist builder image] ***********************************************************************************************************************
skipping: [localhost]
TASK [image_build : Build AWX distribution using container] **********************************************************************************************************
skipping: [localhost]
TASK [image_build : Build AWX distribution locally] ******************************************************************************************************************
skipping: [localhost]
TASK [image_build : Set docker build base path] **********************************************************************************************************************
skipping: [localhost]
TASK [image_build : Set awx_web image name] **************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Set awx_task image name] *************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Ensure directory exists] *************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage sdist] *************************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Template web Dockerfile] *************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Template task Dockerfile] ************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage launch_awx] ********************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage launch_awx_task] ***************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage google-cloud-sdk.repo] *********************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage rsyslog.repo] ******************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage rsyslog.conf] ******************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage supervisor.conf] ***************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage supervisor_task.conf] **********************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage settings.py] *******************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage requirements] ******************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage config watcher] ****************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Stage Makefile] **********************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Build base web image] ****************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Build base task image] ***************************************************************************************************************************
skipping: [localhost]
TASK [image_build : Tag task and web images as latest] ***************************************************************************************************************
skipping: [localhost]
TASK [image_build : Clean docker base directory] *********************************************************************************************************************
skipping: [localhost]
TASK [image_push : Authenticate with Docker registry if registry password given] *************************************************************************************
skipping: [localhost]
TASK [image_push : Remove web image] *********************************************************************************************************************************
skipping: [localhost]
TASK [image_push : Remove task image] ********************************************************************************************************************************
skipping: [localhost]
TASK [image_push : Tag and push web image to registry] ***************************************************************************************************************
skipping: [localhost]
TASK [image_push : Tag and push task image to registry] **************************************************************************************************************
skipping: [localhost]
TASK [image_push : Set full image path for Registry] *****************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Generate broadcast websocket secret] **************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : fail] *********************************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : include_tasks] ************************************************************************************************************************************
skipping: [localhost] => (item=openshift_auth.yml)
skipping: [localhost] => (item=openshift.yml)
TASK [kubernetes : include_tasks] ************************************************************************************************************************************
skipping: [localhost] => (item=kubernetes_auth.yml)
skipping: [localhost] => (item=kubernetes.yml)
TASK [kubernetes : Use kubectl or oc] ********************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : set_fact] *****************************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Record deployment size] ***************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Set expected post-deployment Replicas value] ******************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Delete existing Deployment (or StatefulSet)] ******************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Get Postgres Service Detail] **********************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Template PostgreSQL Deployment (OpenShift)] *******************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Deploy and Activate Postgres (OpenShift)] *********************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Create Temporary Values File (Kubernetes)] ********************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Populate Temporary Values File (Kubernetes)] ******************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Deploy and Activate Postgres (Kubernetes)] ********************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Remove tempfile] **********************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Set postgresql hostname to helm package service (Kubernetes)] *************************************************************************************
skipping: [localhost]
TASK [kubernetes : Wait for Postgres to activate] ********************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Check if Postgres 9.6 is being used] **************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Set new pg image] *********************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Wait for change to take affect] *******************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Set env var for pg upgrade] ***********************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Wait for change to take affect] *******************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Set env var for new pg version] *******************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Wait for Postgres to redeploy] ********************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Wait for Postgres to finish upgrading] ************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Unset upgrade env var] ****************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Wait for Postgres to redeploy] ********************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Set task image name] ******************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Set web image name] *******************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Determine Deployment api version] *****************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Render deployment templates] **********************************************************************************************************************
skipping: [localhost] => (item=None)
skipping: [localhost] => (item=None)
skipping: [localhost] => (item=None)
skipping: [localhost] => (item=None)
skipping: [localhost] => (item=None)
skipping: [localhost]
TASK [kubernetes : Apply Deployment] *********************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Delete any existing management pod] ***************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Template management pod] **************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Create management pod] ****************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Wait for management pod to start] *****************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Migrate database] *********************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Check for Tower Super users] **********************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : create django super user if it does not exist] ****************************************************************************************************
skipping: [localhost]
TASK [kubernetes : update django super user password] ****************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Create the default organization if it is needed.] *************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Delete management pod] ****************************************************************************************************************************
skipping: [localhost]
TASK [kubernetes : Scale up deployment] ******************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Generate broadcast websocket secret] ************************************************************************************************************
ok: [localhost]
TASK [local_docker : Check for existing Postgres data] ***************************************************************************************************************
ok: [localhost]
TASK [local_docker : Record Postgres version] ************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Determine whether to upgrade postgres] **********************************************************************************************************
ok: [localhost]
TASK [local_docker : Set up new postgres paths pre-upgrade] **********************************************************************************************************
skipping: [localhost] => (item=~/.awx/pgdocker/10/data)
TASK [local_docker : Stop AWX before upgrading postgres] *************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Upgrade Postgres] *******************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Copy old pg_hba.conf] ***************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Remove old data directory] **********************************************************************************************************************
ok: [localhost]
TASK [local_docker : Export Docker web image if it isnt local and there isnt a registry defined] *********************************************************************
skipping: [localhost]
TASK [local_docker : Export Docker task image if it isnt local and there isnt a registry defined] ********************************************************************
skipping: [localhost]
TASK [local_docker : Set docker base path] ***************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Ensure directory exists] ************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Copy web image to docker execution] *************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Copy task image to docker execution] ************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Load web image] *********************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Load task image] ********************************************************************************************************************************
skipping: [localhost]
TASK [local_docker : Set full image path for local install] **********************************************************************************************************
skipping: [localhost]
TASK [local_docker : Set DockerHub Image Paths] **********************************************************************************************************************
ok: [localhost]
TASK [local_docker : Create ~/.awx/awxcompose directory] *************************************************************************************************************
changed: [localhost]
TASK [local_docker : Create Redis socket directory] ******************************************************************************************************************
changed: [localhost]
TASK [local_docker : Create Memcached socket directory] **************************************************************************************************************
changed: [localhost]
TASK [local_docker : Create Docker Compose Configuration] ************************************************************************************************************
changed: [localhost] => (item=environment.sh)
changed: [localhost] => (item=credentials.py)
changed: [localhost] => (item=docker-compose.yml)
changed: [localhost] => (item=nginx.conf)
changed: [localhost] => (item=redis.conf)
TASK [local_docker : Set redis config to other group readable to satisfy redis-server] *******************************************************************************
changed: [localhost]
TASK [local_docker : Render SECRET_KEY file] *************************************************************************************************************************
changed: [localhost]
TASK [local_docker : Start the containers] ***************************************************************************************************************************
(venv) $ cd ..
(venv) $ docker-compose logs -f
...
...
the errors
...
...
I repeat the process with Python3. Since Python2 is deprecated and AWX is using Python3 to run ansible. It's the very same routine as described above except for the virtualenv command.
Replace
virtualenv -p python2 venv
with
virtualenv -p python3 venv
Check of the releases
(venv) $ python --version
Python 3.6.9
(venv) $ ansible --version
ansible 2.9.7
config file = None
configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = ~/Projects/awx/venv/lib/python3.6/site-packages/ansible
executable location = ~/Projects/awx/venv/bin/ansible
python version = 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
(venv) $ pip freeze
ansible==2.9.7
attrs==19.3.0
bcrypt==3.1.7
cached-property==1.5.1
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
cryptography==2.9.2
docker==4.2.0
docker-compose==1.25.5
dockerpty==0.4.1
docopt==0.6.2
idna==2.9
importlib-metadata==1.6.0
Jinja2==2.11.2
jsonschema==3.2.0
MarkupSafe==1.1.1
paramiko==2.7.1
pycparser==2.20
PyNaCl==1.3.0
pyrsistent==0.16.0
PyYAML==5.3.1
requests==2.23.0
six==1.14.0
texttable==1.6.2
urllib3==1.25.9
websocket-client==0.57.0
zipp==3.1.0
The very same errors pop up.
A long shot is that by some fluke you are using different versions of the involved Docker images. You might want to check the image id's to make certain those are the same.
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
redis latest a4d3716dbb72 13 hours ago 98.3MB
postgres 10 b500168be260 17 hours ago 200MB
ansible/awx_task 11.0.0 83a56dfe4148 7 days ago 2.52GB
ansible/awx_web 11.0.0 ab9667094eac 7 days ago 2.48GB
memcached alpine acce7f7ac2ef 10 days ago 9.22MB
(why use both Redis and Memcached btw ?)
And a very long shot is that it makes a difference to do this in a virtual machine. I am using VirtualBox 6.1.16 on a Windows 10 host. As per company regulations.
Maybe this will shed some light ...
While the task service is just sitting there it consumes 100% CPU. The processes:
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:14 ? 00:00:00 tini -- sh -c /usr/bin/launch_awx_task.sh
root 6 1 0 11:14 ? 00:00:00 bash /usr/bin/launch_awx_task.sh
root 69 6 95 11:15 ? 00:02:18 /var/lib/awx/venv/awx/bin/python3 /usr/bin/awx-manage migrate --noinput
Is there something I can try to find out what it is doing ?
What do the task logs/stdout say?
Maybe try something like strace
or gdb
on that awe-manage migrate
process to see what its' doing?
No logging at all. Not in the docker logs nor the log files in /var/log. This means the issue happens very early in the code. Before it logs something. It might really be stuck on a network connection after all; when the timeout mechanism does not rely on blocking i/o. Surely it's such a silly problem it doesn't even come to mind :-)
I'm experiencing the same problem. I' not able to start any release. I've tried 9.3.0 10.0.0 and 11.1.0. My error looks like this:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/conf/settings.py", line 87, in _ctit_db_wrapper
yield
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/conf/settings.py", line 415, in __getattr__
value = self._get_local(name)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/conf/settings.py", line 358, in _get_local
setting = Setting.objects.filter(key=name, user__isnull=True).order_by('pk').first()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 653, in first
for obj in (self if self.ordered else self.order_by('pk'))[:1]:
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 274, in __iter__
self._fetch_all()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1131, in execute_sql
cursor = self.connection.cursor()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor
return self._cursor()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 233, in _cursor
self.ensure_connection()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
self.connection = self.get_new_connection(conn_params)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: FATAL: no pg_hba.conf entry for host "172.20.0.6", user "awx", database "awx", SSL off
2020-04-24 13:23:30,507 ERROR awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
self.connection = self.get_new_connection(conn_params)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: no pg_hba.conf entry for host "172.20.0.6", user "awx", database "awx", SSL off
If this were a Java based system I would dump the threads or do a cpu profile. Python however is a bit new to me. Is there a way to cpu-profile a Python process ?
I ran into this problem today as well. I was able to fix it by deleting all of the running docker containers and then running ansible-playbook -i inventory install.yml
again. it took less than 2 minutes for the GUI to come up and I was able to log in.
Hope this helps.
I am getting the same error as @JG127 .
Starting with:
awx_web | Traceback (most recent call last):
awx_web | File "/usr/bin/awx-manage", line 8, in <module>
awx_web | sys.exit(manage())
awx_web | File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
awx_web | execute_from_command_line(sys.argv)
awx_web | File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
awx_web | utility.execute()
The fresh install does not work for me, for whatever reason.
In addition to that I get the error I've posted in an earlier post.
There it says that pg_hba.conf is not configured properly and there is
no entry for host "172.20.0.6"
Therefore, I've investigated pg_hba.conf. There you can find no allow entry for that host. Then I modified the pg_hba.conf like this, allowing everything in 172.16.0.0./12
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all 172.16.0.0/12 trust
After saving the changes and starting the containers again, the problem is gone.
It looks like, because of the error in the init.py, pg_hba.conf gets not populated correctly.
@Naf3tsR It works for me too ! Good job and thanks :)
Yes ! :-) Can this be fixed via Docker (Compose) ? I'd rather not hack my way in.
@Naf3tsR Thanks, it works for me. I'm using the 11.2.0 version, still has the issue.
Hi, I have same problem, it was resolved with using Postgresql 12.
Upgrade to Postgresql 12 didn't help for me.
I'm having the same problem in version 11.2.0, using the docker compose, however, in the second attempt it always works. I'm using PostgresSQL in compose as well. My logs are similar to @roedie.
Out of curiosity, did anyone have the problem using OpenShift or Kubernetes?
Ran in to this with 11.2.0 today myself. As reported by others, deleted all the containers and ran the Ansible installer again and it worked.
Hi, In my inventory file I have set two hosts to deploy AWX 11.2.0 on two nodes (local_docker mode) but I have this below error in web container AFTER I have change the /etc/tower/settings.py file in the web and task containers from myHost2 only (param : CLUSTER_HOST_ID = "awx" changed with "myHost2" ) :
In myHost1 this param is set to "awx" by default
[tower] myHost1 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa myHost2 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa
[all:vars] ansible_python_interpreter="/usr/bin/env python"
2020-06-02 12:11:09,744 WARNING awx.main.wsbroadcast Connection from myHost2 to awx failed: 'Cannot connect to host awx:443 ssl:False [Name or service not known]'.
2020-06-02 12:13:45,182 WARNING awx.main.wsbroadcast Connection from awx to myHost2 failed: 'Cannot connect to host myHost2:443 ssl:False [[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:897)]'.
==> Have you ever seen this error ?
Hi, In my inventory file I have set two hosts to deploy AWX 11.2.0 on two nodes (local_docker mode) but I have this below error in web container AFTER I have change the /etc/tower/settings.py file in the web and task containers from myHost2 only (param : CLUSTER_HOST_ID = "awx" changed with "myHost2" ) :
In myHost1 this param is set to "awx" by default
> inventory file :
[tower] myHost1 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa myHost2 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa
[all:vars] ansible_python_interpreter="/usr/bin/env python"
> Error in the web container from myHost2 :
2020-06-02 12:11:09,744 WARNING awx.main.wsbroadcast Connection from myHost2 to awx failed: 'Cannot connect to host awx:443 ssl:False [Name or service not known]'.
> Error in the web container from myHost1 (awx) ;
2020-06-02 12:13:45,182 WARNING awx.main.wsbroadcast Connection from awx to myHost2 failed: 'Cannot connect to host myHost2:443 ssl:False [[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:897)]'.
==> Have you ever seen this error ?
The container doesnt have DNS to resolve to the other container, thats your problem and its a lot different from the other here, if you want to cluster AWX you can go to https://github.com/sujiar37/AWX-HA-InstanceGroup/issues/26.
Hi, In my inventory file I have set two hosts to deploy AWX 11.2.0 on two nodes (local_docker mode) but I have this below error in web container AFTER I have change the /etc/tower/settings.py file in the web and task containers from myHost2 only (param : CLUSTER_HOST_ID = "awx" changed with "myHost2" ) : In myHost1 this param is set to "awx" by default
> inventory file :
[tower] myHost1 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa myHost2 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa [all:vars] ansible_python_interpreter="/usr/bin/env python"
> Error in the web container from myHost2 :
2020-06-02 12:11:09,744 WARNING awx.main.wsbroadcast Connection from myHost2 to awx failed: 'Cannot connect to host awx:443 ssl:False [Name or service not known]'.
> Error in the web container from myHost1 (awx) ;
2020-06-02 12:13:45,182 WARNING awx.main.wsbroadcast Connection from awx to myHost2 failed: 'Cannot connect to host myHost2:443 ssl:False [[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:897)]'. ==> Have you ever seen this error ?
The container doesnt have DNS to resolve to the other container, thats your problem and its a lot different from the other here, if you want to cluster AWX you can go to sujiar37/AWX-HA-InstanceGroup#26.
So we can't add two hosts in the inventory file ?? I thought we could...
Hi, In my inventory file I have set two hosts to deploy AWX 11.2.0 on two nodes (local_docker mode) but I have this below error in web container AFTER I have change the /etc/tower/settings.py file in the web and task containers from myHost2 only (param : CLUSTER_HOST_ID = "awx" changed with "myHost2" ) : In myHost1 this param is set to "awx" by default
> inventory file :
[tower] myHost1 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa myHost2 ansible_ssh_user=root ansible_ssh_private_key_file=/root/.ssh/id_rsa [all:vars] ansible_python_interpreter="/usr/bin/env python"
> Error in the web container from myHost2 :
2020-06-02 12:11:09,744 WARNING awx.main.wsbroadcast Connection from myHost2 to awx failed: 'Cannot connect to host awx:443 ssl:False [Name or service not known]'.
> Error in the web container from myHost1 (awx) ;
2020-06-02 12:13:45,182 WARNING awx.main.wsbroadcast Connection from awx to myHost2 failed: 'Cannot connect to host myHost2:443 ssl:False [[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:897)]'. ==> Have you ever seen this error ?
The container doesnt have DNS to resolve to the other container, thats your problem and its a lot different from the other here, if you want to cluster AWX you can go to sujiar37/AWX-HA-InstanceGroup#26.
So we can't add two hosts in the inventory file ?? I thought we could...
AFAIK, you can, but I'm not sure that will cluster itself, besides you are editing manually the /etc/tower/settings.py. Without using a orchestration tool, the containers will not see the others beyond its own compose.
Hi,
I have the exact same behavior with a fresh install in 12.0.0 with docker-compose, using python3 and debian 10 as a host. I'll add more info as I dig deeper.
EDIT
I had the same issue after I runned the installer.
I forced a migration with awx-manage migrate
in the awx_task container. Then I only have this error recurring:
Traceback (most recent call last): File "/usr/bin/awx-manage", line 8, in
sys.exit(manage()) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/init.py", line 154, in manage execute_from_command_line(sys.argv) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line utility.execute() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv self.execute(*args, *cmd_options) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute output = self.handle(args, **options) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle reaper.reap() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap (changed, me) = Instance.objects.get_or_register() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 158, in get_or_register return (False, self.me()) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 116, in me raise RuntimeError("No instance found with the current cluster host id") RuntimeError: No instance found with the current cluster host id
At that point I couldn't log in to the webUI with the default admin credentials.
I forced the creation of a superuser with awx-manage createsuperuser
and then I could login, the webapp seems to work fine so far, but I still have the previous error happening ever 10 seconds.
So something seems to be not working after the migration but I don't know yet what.
Any chance 13.0.0 solves it ?
I just tried a fresh install with the 13.0.0 (docker-compose mode on debian 10). It seems to give the "main_instance" error too:
2020-06-26 08:38:09,393 INFO exited: dispatcher (exit status 1; not expected) 2020-06-26 08:38:09,393 INFO exited: dispatcher (exit status 1; not expected) 2020-06-26 08:38:10,406 INFO spawned: 'dispatcher' with pid 160 2020-06-26 08:38:10,406 INFO spawned: 'dispatcher' with pid 160 2020-06-26 08:38:11,409 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-06-26 08:38:11,409 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) Traceback (most recent call last): File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) psycopg2.errors.UndefinedTable: relation "main_instance" does not exist LINE 1: SELECT (1) AS "a" FROM "main_instance" WHERE "main_instance"... ^
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/bin/awx-manage", line 8, in
sys.exit(manage()) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/init.py", line 154, in manage execute_from_command_line(sys.argv) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line utility.execute() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv self.execute(*args, *cmd_options) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute output = self.handle(args, **options) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle reaper.reap() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap (changed, me) = Instance.objects.get_or_register() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 144, in get_or_register return (False, self.me()) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 100, in me if node.exists(): File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 766, in exists return self.query.has_results(using=self.db) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/query.py", line 522, in has_results return compiler.has_results() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1110, in has_results return bool(self.execute_sql(SINGLE)) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1140, in execute_sql cursor.execute(sql, params) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers return executor(sql, params, many, context) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/utils.py", line 89, in exit raise dj_exc_value.with_traceback(traceback) from exc_value File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) django.db.utils.ProgrammingError: relation "main_instance" does not exist LINE 1: SELECT (1) AS "a" FROM "main_instance" WHERE "main_instance"...
Then I ran awx-manage migrate
which left me with:
2020-06-26 08:46:19,276 INFO exited: dispatcher (exit status 1; not expected) 2020-06-26 08:46:19,276 INFO exited: dispatcher (exit status 1; not expected) 2020-06-26 08:46:20,287 INFO spawned: 'dispatcher' with pid 801 2020-06-26 08:46:20,287 INFO spawned: 'dispatcher' with pid 801 2020-06-26 08:46:21,291 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-06-26 08:46:21,291 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-06-26 08:46:23,396 WARNING awx.main.dispatch.periodic periodic beat started Traceback (most recent call last): File "/usr/bin/awx-manage", line 8, in
sys.exit(manage()) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/init.py", line 154, in manage execute_from_command_line(sys.argv) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line utility.execute() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv self.execute(*args, *cmd_options) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute output = self.handle(args, **options) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle reaper.reap() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap (changed, me) = Instance.objects.get_or_register() File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 144, in get_or_register return (False, self.me()) File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 102, in me raise RuntimeError("No instance found with the current cluster host id") RuntimeError: No instance found with the current cluster host id
Tried with latest version (v13.0.0) on a clean env (Debian 10) and didn't have issues.
NOTE: Using an external PostgreSQL 11.8 database.
I see this problem in my CI job that builds my awx containers. Approximately one out of 10 starts fail.
It seems that something is killing (or restarting) the postgres container quite early
root@runner-hgefapak-project-60-concurrent-1:/# docker logs -f awx_postgres
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data/pgdata ... ok
...
server started
CREATE DATABASE
...
2020-07-22 14:30:45.852 UTC [76] ERROR: relation "django_migrations" does not exist at character 124
2020-07-22 14:30:45.852 UTC [76] STATEMENT: SELECT "django_migrations"."id", "django_migrations"."app", "django_migrations"."name", "django_migrations"."applied" FROM "django_migrations" WHERE ("django_migrations"."app" = 'main' AND NOT ("django_migrations"."name"::text LIKE '%squashed%')) ORDER BY "django_migrations"."id" DESC LIMIT 1
root@runner-hgefapak-project-60-concurrent-1:/# # !!!! HERE something has stopped the container !!!!
root@runner-hgefapak-project-60-concurrent-1:/# docker logs -f awx_postgres
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
...
Which in turn causes the migration task inside the awx_task container to fail:
root@runner-hgefapak-project-60-concurrent-1:/# docker logs -f awx_task
Using /etc/ansible/ansible.cfg as config file
127.0.0.1 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"elapsed": 0,
"match_groupdict": {},
"match_groups": [],
"path": null,
"port": 5432,
"search_regex": null,
"state": "started"
}
Using /etc/ansible/ansible.cfg as config file
127.0.0.1 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"db": "awx"
}
Operations to perform:
Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
Applying contenttypes.0001_initial... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying taggit.0001_initial... OK
Applying taggit.0002_auto_20150616_2121... OK
Applying auth.0001_initial... OK
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 82, in _execute
return self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request
Restarting the awx_task container seams to restart the migration process which is then working. (supervisorctl restart all
doesn't help).
So the question is, what is restarting the awx_postgres container?
From the postgres container entrypoint:
docker_temp_server_start "$@"
docker_setup_db
docker_process_init_files /docker-entrypoint-initdb.d/*
docker_temp_server_stop
If the migrate process starts during the postgresql initialization, then the connection will be dropped as soon as the temp_server stops.
Came here while investigating an issue during Release 13.0.0 install issue that was also giving me ERROR: relation "conf_setting" does not exist at character 158
I ended up reverting to release 12.0.0 while troubleshooting assuming it was a dirty release. This is a fresh docker-compose install using containerized Postgres.
I was able to eventually get into the web UI for 12.0.0 using 2 steps.
awx-manage migrate
. This completed the migration without errors.This appears to be a valid work-around for new docker-compose install at least with my config
ran into the same UndefinedTable: relation "main_instance" does not exist
problem with 14.1.0
installed with docker-compose
and vanilla settings. Fixed by
awx-manage migrate
within awx_task
containerawx_task
containeras mentioned by @johnrkriter (thanks a lot!) But could not understand what is the major difficulty to resolve this
I had same problem , and it was resolved with same way...
(Ubuntu 20.04.1
+ AWX 14.1.0
+ docker-compose
clean install)
The workaround works for me as well. Is this going to be fixed?
This issue is still around with AWX 15.0.0
on docker-compose
deployment.
The workaround of @johnrkriter works:
docker exec awx_task awx-manage migrate
docker container restart awx_task
Pity is that nobody from the project seems interested... :-(
This looks like a race condition betweent the pg container and the aws-task container. Since i am not familiar with the project structure it will probably take me some time to find the right place to look :) I will update this as soon as i find something.
So i think the description of @anxstj is pretty accurate and complete, we now just need to figure out a good way to wait for the postges ini to finish before we start the migration. Does anybody have a good idea how to do that?
Judging from the issue itself, i think the best option to fix this is to make the aws-task container (or the script running in it) fail if the migration fails instead of trying to continue with something that will never succeed. The db migrations themselves should be idempotent, so just failing and starting fresh should be fine.
Does it work after applying #8497 ?
@dasJ I think the patch could still run into the issue where the "while loop" exits because it successfully accesses the instance of the "docker_temp_server" that will be killed directly after. It makes it a lot more unlikely, but i do not see how the patch would completely avoid this case. It think the real fix here is to make the aws-task container completely fail on this error and start fresh (not sure if the "set -e" is enough here since iirc all errors are still retried when running "aws-migrate")
@jklare wdym by "docker_temp_server"? I can neither find it in this repo nor when googeling
ISSUE TYPE
SUMMARY
A fresh install of the 11.0.0 release doesn't work, even though the installation instructions are followed. There are sql errors and a recurring error about clustering.
ENVIRONMENT
STEPS TO REPRODUCE
The installation playbook runs w/o apparent errors. However, when checking the Docker compose logs there are loads of sql errors and cluster errors as shown below.
The procedure was repeated by commenting out the line "dockerhub_base=ansible" in the inventory file. Tot make certain the AWX Docker images are build locally and in sync with the installer. The very same errors happen.
EXPECTED RESULTS
No errors in the logs and a fully functional application.
ACTUAL RESULTS
The logs are filling with errors and the application is not fully functional. Sometimes I'm getting an angry potato logo. I've added a screenshot in attachment. What is it used for ? :-)
The odd thing however is when there is no angry potato logo the application seems to be functional (i.e. management jobs can be run successfully). Despite the huge number of errors in the logs.
When there is an angry potato logo I can log in but not run jobs.
ADDITIONAL INFORMATION
These SQL statement errors below are repeated very frequently: The relations "conf_setting" and "main_instance" do not exist.
This error about clustering is repeated very frequently: