Closed mjpalacios closed 4 years ago
@mjpalacios,
Can you check out the web container logs and see if there's a traceback with details?
I attempted opening the execution log of a job (id=10730) that was submitted while both instances were running and the entries below showed up in the container's log: (URLs and IP addresses have been sanitized)
192.168.1.1 - - [25/Jun/2020:02:40:42 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.8, 192.168.3.11"
2020-06-25 02:40:42,840 DEBUG awx.main.consumers Channel specific.NGGqtyoc!CruwJTPDabWH left groups {'schedules-changed'} and joined {'jobs-summary', 'job_events-10730'}
2020-06-25 02:40:42,840 DEBUG Channel specific.NGGqtyoc!CruwJTPDabWH left groups {'schedules-changed'} and joined {'jobs-summary', 'job_events-10730'}
2020-06-25 02:40:43,238 ERROR django.request Internal Server Error: /api/v2/jobs/10730/
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view
return self.dispatch(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/generics.py", line 299, in dispatch
return super(APIView, self).dispatch(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 505, in dispatch
response = self.handle_exception(exc)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 465, in handle_exception
self.raise_uncaught_exception(exc)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
raise exc
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 502, in dispatch
response = handler(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/generics.py", line 208, in get
return self.retrieve(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/mixins.py", line 56, in retrieve
return Response(serializer.data)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/serializers.py", line 562, in data
ret = super().data
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/serializers.py", line 260, in data
self._data = self.to_representation(self.instance)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 3018, in to_representation
ret = super(JobSerializer, self).to_representation(obj)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 802, in to_representation
ret = super(UnifiedJobSerializer, self).to_representation(obj)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 2779, in to_representation
ret = super(JobOptionsSerializer, self).to_representation(obj)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/serializers.py", line 529, in to_representation
ret[field.field_name] = field.to_representation(attribute)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/fields.py", line 1905, in to_representation
return method(value)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 3074, in get_host_status_counts
counts = obj.job_events.only('event_data').get(event='playbook_on_stats').get_host_status_counts()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 412, in get
(self.model._meta.object_name, num)
awx.main.models.events.JobEvent.MultipleObjectsReturned: get() returned more than one JobEvent -- it returned 2!
2020-06-25 02:40:43,238 ERROR django.request Internal Server Error: /api/v2/jobs/10730/
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view
return self.dispatch(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/generics.py", line 299, in dispatch
return super(APIView, self).dispatch(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 505, in dispatch
response = self.handle_exception(exc)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 465, in handle_exception
self.raise_uncaught_exception(exc)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
raise exc
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/views.py", line 502, in dispatch
response = handler(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/generics.py", line 208, in get
return self.retrieve(request, *args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/mixins.py", line 56, in retrieve
return Response(serializer.data)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/serializers.py", line 562, in data
ret = super().data
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/serializers.py", line 260, in data
self._data = self.to_representation(self.instance)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 3018, in to_representation
ret = super(JobSerializer, self).to_representation(obj)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 802, in to_representation
ret = super(UnifiedJobSerializer, self).to_representation(obj)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 2779, in to_representation
ret = super(JobOptionsSerializer, self).to_representation(obj)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/serializers.py", line 529, in to_representation
ret[field.field_name] = field.to_representation(attribute)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/rest_framework/fields.py", line 1905, in to_representation
return method(value)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/api/serializers.py", line 3074, in get_host_status_counts
counts = obj.job_events.only('event_data').get(event='playbook_on_stats').get_host_status_counts()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 412, in get
(self.model._meta.object_name, num)
awx.main.models.events.JobEvent.MultipleObjectsReturned: get() returned more than one JobEvent -- it returned 2!
[pid: 794|app: 0|req: 366/41029] 192.168.5.1 () {52 vars in 2833 bytes} [Thu Jun 25 02:40:43 2020] GET /api/v2/jobs/10730/ => generated 41 bytes in 226 msecs (HTTP/1.1 500) 6 headers in 208 bytes (1 switches on core 0)
192.168.5.1 - - [25/Jun/2020:02:40:43 +0000] "GET /api/v2/jobs/10730/ HTTP/1.1" 500 41 "https://awx-web-svc-app-ops.origin-dc1-core.domain.org/ " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "192.168.9.123"
2020-06-25 02:40:43,300 DEBUG awx.main.consumers Channel specific.NGGqtyoc!CruwJTPDabWH left groups {'jobs-summary', 'job_events-10730'} and joined {'schedules-changed'}
2020-06-25 02:40:43,300 DEBUG Channel specific.NGGqtyoc!CruwJTPDabWH left groups {'jobs-summary', 'job_events-10730'} and joined {'schedules-changed'}
[pid: 788|app: 0|req: 266/41030] 192.168.5.1 () {56 vars in 2930 bytes} [Thu Jun 25 02:40:43 2020] OPTIONS /api/v2/jobs/10730/ => generated 11873 bytes in 229 msecs (HTTP/1.1 200) 10 headers in 408 bytes (1 switches on core 0)
192.168.5.1 - - [25/Jun/2020:02:40:43 +0000] "OPTIONS /api/v2/jobs/10730/ HTTP/1.1" 200 11873 "https://awx-web-svc-app-ops.origin-dc1-core.domain.org/ " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "192.168.9.122"
[pid: 677|app: 0|req: 944/41031] 192.168.5.1 () {52 vars in 2936 bytes} [Thu Jun 25 02:40:43 2020] GET /api/v2/jobs/10730/job_events/?order_by=-start_line&page=1&page_size=50 => generated 104951 bytes in 265 msecs (HTTP/1.1 200) 11 headers in 424 bytes (1 switches on core 0)
192.168.5.1 - - [25/Jun/2020:02:40:43 +0000] "GET /api/v2/jobs/10730/job_events/?order_by=-start_line&page=1&page_size=50 HTTP/1.1" 200 104951 "https://awx-web-svc-app-ops.origin-dc1-core.domain.org/ " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "192.168.9.126"
[pid: 794|app: 0|req: 367/41032] 192.168.5.1 () {56 vars in 2934 bytes} [Thu Jun 25 02:40:43 2020] OPTIONS /api/v2/unified_jobs/ => generated 10718 bytes in 130 msecs (HTTP/1.1 200) 10 headers in 400 bytes (1 switches on core 0)
192.168.5.1 - - [25/Jun/2020:02:40:43 +0000] "OPTIONS /api/v2/unified_jobs/ HTTP/1.1" 200 10718 "https://awx-web-svc-app-ops.origin-dc1-core.domain.org/ " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "192.168.9.123"
[pid: 788|app: 0|req: 267/41033] 192.168.5.1 () {54 vars in 2989 bytes} [Thu Jun 25 02:40:43 2020] GET /api/v2/unified_jobs/?page_size=20&order_by=-finished¬__launch_type=sync&page=4 => generated 53350 bytes in 737 msecs (HTTP/1.1 200) 8 headers in 235 bytes (1 switches on core 0)
192.168.5.1 - - [25/Jun/2020:02:40:44 +0000] "GET /api/v2/unified_jobs/?page_size=20&order_by=-finished¬__launch_type=sync&page=4 HTTP/1.1" 200 53350 "https://awx-web-svc-app-ops.origin-dc1-core.domain.org/ " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "192.168.9.122"
192.168.5.1 - - [25/Jun/2020:02:40:44 +0000] "OPTIONS /api/v2/unified_jobs/ HTTP/1.1" 200 10718 "https://awx-web-svc-app-ops.origin-dc1-core.domain.org/ " "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "192.168.9.123"
[pid: 779|app: 0|req: 626/41034] 192.168.5.1 () {60 vars in 3034 bytes} [Thu Jun 25 02:40:44 2020] OPTIONS /api/v2/unified_jobs/ => generated 10718 bytes in 112 msecs (HTTP/1.1 200) 8 headers in 235 bytes (1 switches on core 0)
[pid: 794|app: 0|req: 368/41035] 192.168.5.1 () {42 vars in 607 bytes} [Thu Jun 25 02:40:44 2020] GET / => generated 11459 bytes in 53 msecs (HTTP/1.1 200) 5 headers in 169 bytes (1 switches on core 0)
192.168.5.1 - - [25/Jun/2020:02:40:44 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.9, 192.168.3.14"
[pid: 794|app: 0|req: 369/41036] 192.168.7.1 () {42 vars in 607 bytes} [Thu Jun 25 02:40:47 2020] GET / => generated 11459 bytes in 53 msecs (HTTP/1.1 200) 5 headers in 169 bytes (1 switches on core 0)
192.168.7.1 - - [25/Jun/2020:02:40:47 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.8, 192.168.3.10"
[pid: 794|app: 0|req: 370/41037] 192.168.7.1 () {42 vars in 607 bytes} [Thu Jun 25 02:40:49 2020] GET / => generated 11459 bytes in 54 msecs (HTTP/1.1 200) 5 headers in 169 bytes (1 switches on core 0)
192.168.7.1 - - [25/Jun/2020:02:40:49 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.9, 192.168.3.14"
[pid: 779|app: 0|req: 627/41038] 192.168.1.1 () {42 vars in 607 bytes} [Thu Jun 25 02:40:52 2020] GET / => generated 11459 bytes in 60 msecs (HTTP/1.1 200) 5 headers in 169 bytes (1 switches on core 0)
192.168.1.1 - - [25/Jun/2020:02:40:52 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.8, 192.168.3.13"
[pid: 794|app: 0|req: 371/41039] 192.168.7.1 () {42 vars in 607 bytes} [Thu Jun 25 02:40:54 2020] GET / => generated 11459 bytes in 52 msecs (HTTP/1.1 200) 5 headers in 169 bytes (1 switches on core 0)
192.168.7.1 - - [25/Jun/2020:02:40:54 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.9, 192.168.3.11"
[pid: 794|app: 0|req: 372/41040] 192.168.8.1 () {42 vars in 607 bytes} [Thu Jun 25 02:40:57 2020] GET / => generated 11459 bytes in 58 msecs (HTTP/1.1 200) 5 headers in 169 bytes (1 switches on core 0)
192.168.8.1 - - [25/Jun/2020:02:40:57 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.8, 192.168.3.10"
[pid: 779|app: 0|req: 628/41041] 192.168.7.1 () {42 vars in 607 bytes} [Thu Jun 25 02:40:59 2020] GET / => generated 11459 bytes in 55 msecs (HTTP/1.1 200) 5 headers in 169 bytes (1 switches on core 0)
192.168.7.1 - - [25/Jun/2020:02:40:59 +0000] "GET / HTTP/1.1" 200 11459 "-" "-" "192.168.2.9, 192.168.3.10"
RESULT 2
OKREADY
This looks like a duplicate of https://github.com/ansible/awx/issues/6982 to me.
So we've seen this before, but I'm not really sure how folks have managed to get data into this state: https://github.com/ansible/awx/issues/6982#issuecomment-629380382
@mjpalacios if you run this query, can you share the output?
select created, modified, uuid, parent_uuid playbook from main_jobevent where job_id=10730 and event='playbook_on_stats';
@ryanpetrello , please find below the query and its output
awx=> select created, modified, uuid, parent_uuid playbook from main_jobevent where job_id=10730 and event='playbook_on_stats';
created | modified | uuid | playbook
-------------------------------+-------------------------------+--------------------------------------+--------------------------------------
2020-06-24 14:15:48.682015-05 | 2020-06-24 14:15:49.574287-05 | 0830edea-81a5-465d-ae08-eab8c33495ed | 87f97b99-543b-450f-81f8-ee4aeaaf9ff5
2020-06-24 14:15:40.935085-05 | 2020-06-24 14:15:41.717927-05 | 4fb0079a-d088-49c3-afe6-ca8bf5a2eece | f76da9c5-3392-48c1-9b0e-5f3fb5f18715
@mjpalacios can you put a ,
between parent_uuid
and playbook
- sorry about that.
select created, modified, uuid, parent_uuid, playbook from main_jobevent where job_id=10730 and event='playbook_on_stats';
@ryanpetrello no problem, see below:
awx=> select created, modified, uuid, parent_uuid, playbook from main_jobevent where job_id=10730 and event='playbook_on_stats';
created | modified | uuid | parent_uuid | playbook
-------------------------------+-------------------------------+--------------------------------------+--------------------------------------+-----------------
2020-06-24 14:15:48.682015-05 | 2020-06-24 14:15:49.574287-05 | 0830edea-81a5-465d-ae08-eab8c33495ed | 87f97b99-543b-450f-81f8-ee4aeaaf9ff5 | deploy_main.yml
2020-06-24 14:15:40.935085-05 | 2020-06-24 14:15:41.717927-05 | 4fb0079a-d088-49c3-afe6-ca8bf5a2eece | f76da9c5-3392-48c1-9b0e-5f3fb5f18715 | deploy_main.yml
(2 rows)
@mjpalacios I'd really like to try to reproduce something like this - is there any way you could share your playbook (or some sanitized, simpler version of it)?
Also, does this happen for every run of this specific job template, or is this the only job you know of that encountered this problem?
Also, could you run and share (sanitized) this adhoc shell command run under the same (or similar) inventory where you ran for this job?
ansible-config dump
You know, I wonder if the way you have your RabbitMQ HA setup, if sometimes playbooks are running twice.
In other words, you launch a job, and the same ansible-playbook
ends up running on two distinct nodes...
Do you have any evidence of other duplicate events for this one job?
@mjpalacios does this job have any events that represent fact gathering? If so, to confirm this theory, we could see if there are any duplicate fact gathers that have distinct IPs - that would suggest that your one job ran in duplicate on multiple hosts in your cluster.
@mjpalacios I'd really like to try to reproduce something like this - is there any way you could share your playbook (or some sanitized, simpler version of it)?
Not this playbook in particular, but I can share another one which creates a .kubeconfig file and exhibits the same behavior (job_id=10719)
Also, does this happen for every run of this specific job template, or is this the only job you know of that encountered this problem?
This happened for every run of every job template that was launched while both AWX instances were running. The anomalous behavior stopped as soon as I stopped one of the instances (the one that was disabled at the GTM)
Also, could you run and share (sanitized) this adhoc shell command run under the same (or similar) inventory where you ran for this job? ansible-config dump
Here the output of the command in the context of the 'awx-web' container
ACTION_WARNINGS(default) = True AGNOSTIC_BECOME_PROMPT(default) = True ALLOW_WORLD_READABLE_TMPFILES(default) = False ANSIBLE_CONNECTION_PATH(default) = None ANSIBLE_COW_PATH(default) = None ANSIBLE_COW_SELECTION(default) = default ANSIBLE_COW_WHITELIST(default) = ['bud-frogs', 'bunny', 'cheese', 'daemon', 'default', 'dragon', 'elephant-in-snake', 'elephant', 'eyes', 'hellokitty', 'kitty', 'luke-koala', 'meow', 'milk', 'moofasa', 'moose', 'ren', 'sheep', 'small', 'stegosaurus', 'stimpy', 'supermilker', 'three-eyes', 'turkey', 'turtle', 'tux', 'udder', 'vader-koala', 'vader', 'www'] ANSIBLE_FORCE_COLOR(default) = False ANSIBLE_NOCOLOR(default) = False ANSIBLE_NOCOWS(default) = False ANSIBLE_PIPELINING(default) = False ANSIBLE_SSH_ARGS(default) = -C -o ControlMaster=auto -o ControlPersist=60s ANSIBLE_SSH_CONTROL_PATH(default) = None ANSIBLE_SSH_CONTROL_PATH_DIR(default) = ~/.ansible/cp ANSIBLE_SSH_EXECUTABLE(default) = ssh ANSIBLE_SSH_RETRIES(default) = 0 ANY_ERRORS_FATAL(default) = False BECOME_ALLOW_SAME_USER(default) = False BECOME_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/become', '/usr/share/ansible/plugins/become'] CACHE_PLUGIN(default) = memory CACHE_PLUGIN_CONNECTION(default) = None CACHE_PLUGIN_PREFIX(default) = ansible_facts CACHE_PLUGIN_TIMEOUT(default) = 86400 COLLECTIONS_PATHS(default) = ['/home/awx/.ansible/collections', '/usr/share/ansible/collections'] COLOR_CHANGED(default) = yellow COLOR_CONSOLE_PROMPT(default) = white COLOR_DEBUG(default) = dark gray COLOR_DEPRECATE(default) = purple COLOR_DIFF_ADD(default) = green COLOR_DIFF_LINES(default) = cyan COLOR_DIFF_REMOVE(default) = red COLOR_ERROR(default) = red COLOR_HIGHLIGHT(default) = white COLOR_OK(default) = green COLOR_SKIP(default) = cyan COLOR_UNREACHABLE(default) = bright red COLOR_VERBOSE(default) = blue COLOR_WARN(default) = bright purple COMMAND_WARNINGS(default) = True CONDITIONAL_BARE_VARS(default) = True CONNECTION_FACTS_MODULES(default) = {'eos': 'eos_facts', 'frr': 'frr_facts', 'ios': 'ios_facts', 'iosxr': 'iosxr_facts', 'junos': 'junos_facts', 'nxos': 'nxos_facts', 'vyos': 'vyos_facts'} COVERAGE_REMOTE_OUTPUT(default) = None COVERAGE_REMOTE_WHITELIST(default) = * DEFAULT_ACTION_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/action', '/usr/share/ansible/plugins/action'] DEFAULT_ALLOW_UNSAFE_LOOKUPS(default) = False DEFAULT_ASK_PASS(default) = False DEFAULT_ASK_VAULT_PASS(default) = False DEFAULT_BECOME(default) = False DEFAULT_BECOME_ASK_PASS(default) = False DEFAULT_BECOME_EXE(default) = None DEFAULT_BECOME_FLAGS(default) = DEFAULT_BECOME_METHOD(default) = sudo DEFAULT_BECOME_USER(default) = root DEFAULT_CACHE_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/cache', '/usr/share/ansible/plugins/cache'] DEFAULT_CALLABLE_WHITELIST(default) = [] DEFAULT_CALLBACK_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/callback', '/usr/share/ansible/plugins/callback'] DEFAULT_CALLBACK_WHITELIST(default) = [] DEFAULT_CLICONF_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/cliconf', '/usr/share/ansible/plugins/cliconf'] DEFAULT_CONNECTION_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/connection', '/usr/share/ansible/plugins/connection'] DEFAULT_DEBUG(default) = False DEFAULT_EXECUTABLE(default) = /bin/sh DEFAULT_FACT_PATH(default) = None DEFAULT_FILTER_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/filter', '/usr/share/ansible/plugins/filter'] DEFAULT_FORCE_HANDLERS(default) = False DEFAULT_FORKS(default) = 5 DEFAULT_GATHERING(default) = implicit DEFAULT_GATHER_SUBSET(default) = ['all'] DEFAULT_GATHER_TIMEOUT(default) = 10 DEFAULT_HANDLER_INCLUDES_STATIC(default) = False DEFAULT_HASH_BEHAVIOUR(default) = replace DEFAULT_HOST_LIST(default) = ['/etc/ansible/hosts'] DEFAULT_HTTPAPI_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/httpapi', '/usr/share/ansible/plugins/httpapi'] DEFAULT_INTERNAL_POLL_INTERVAL(default) = 0.001 DEFAULT_INVENTORY_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/inventory', '/usr/share/ansible/plugins/inventory'] DEFAULT_JINJA2_EXTENSIONS(default) = [] DEFAULT_JINJA2_NATIVE(default) = False DEFAULT_KEEP_REMOTE_FILES(default) = False DEFAULT_LIBVIRT_LXC_NOSECLABEL(default) = False DEFAULT_LOAD_CALLBACK_PLUGINS(default) = False DEFAULT_LOCAL_TMP(default) = /home/awx/.ansible/tmp/ansible-local-1227ghoww7kz DEFAULT_LOG_FILTER(default) = [] DEFAULT_LOG_PATH(default) = None DEFAULT_LOOKUP_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/lookup', '/usr/share/ansible/plugins/lookup'] DEFAULT_MANAGED_STR(default) = Ansible managed DEFAULT_MODULE_ARGS(default) = DEFAULT_MODULE_COMPRESSION(default) = ZIP_DEFLATED DEFAULT_MODULE_NAME(default) = command DEFAULT_MODULE_PATH(default) = ['/home/awx/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] DEFAULT_MODULE_UTILS_PATH(default) = ['/home/awx/.ansible/plugins/module_utils', '/usr/share/ansible/plugins/module_utils'] DEFAULT_NETCONF_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/netconf', '/usr/share/ansible/plugins/netconf'] DEFAULT_NO_LOG(default) = False DEFAULT_NO_TARGET_SYSLOG(default) = False DEFAULT_NULL_REPRESENTATION(default) = None DEFAULT_POLL_INTERVAL(default) = 15 DEFAULT_PRIVATE_KEY_FILE(default) = None DEFAULT_PRIVATE_ROLE_VARS(default) = False DEFAULT_REMOTE_PORT(default) = None DEFAULT_REMOTE_USER(default) = None DEFAULT_ROLES_PATH(default) = ['/home/awx/.ansible/roles', '/usr/share/ansible/roles', '/etc/ansible/roles'] DEFAULT_SCP_IF_SSH(default) = smart DEFAULT_SELINUX_SPECIAL_FS(default) = ['fuse', 'nfs', 'vboxsf', 'ramfs', '9p', 'vfat'] DEFAULT_SFTP_BATCH_MODE(default) = True DEFAULT_SQUASH_ACTIONS(default) = ['apk', 'apt', 'dnf', 'homebrew', 'openbsd_pkg', 'pacman', 'pip', 'pkgng', 'yum', 'zypper'] DEFAULT_SSH_TRANSFER_METHOD(default) = None DEFAULT_STDOUT_CALLBACK(default) = default DEFAULT_STRATEGY(default) = linear DEFAULT_STRATEGY_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/strategy', '/usr/share/ansible/plugins/strategy'] DEFAULT_SU(default) = False DEFAULT_SYSLOG_FACILITY(default) = LOG_USER DEFAULT_TASK_INCLUDES_STATIC(default) = False DEFAULT_TERMINAL_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/terminal', '/usr/share/ansible/plugins/terminal'] DEFAULT_TEST_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/test', '/usr/share/ansible/plugins/test'] DEFAULT_TIMEOUT(default) = 10 DEFAULT_TRANSPORT(default) = smart DEFAULT_UNDEFINED_VAR_BEHAVIOR(default) = True DEFAULT_VARS_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/vars', '/usr/share/ansible/plugins/vars'] DEFAULT_VAULT_ENCRYPT_IDENTITY(default) = None DEFAULT_VAULT_IDENTITY(default) = default DEFAULT_VAULT_IDENTITY_LIST(default) = [] DEFAULT_VAULT_ID_MATCH(default) = False DEFAULT_VAULT_PASSWORD_FILE(default) = None DEFAULT_VERBOSITY(default) = 0 DEPRECATION_WARNINGS(default) = True DIFF_ALWAYS(default) = False DIFF_CONTEXT(default) = 3 DISPLAY_ARGS_TO_STDOUT(default) = False DISPLAY_SKIPPED_HOSTS(default) = True DOCSITE_ROOT_URL(default) = https://docs.ansible.com/ansible/ DOC_FRAGMENT_PLUGIN_PATH(default) = ['/home/awx/.ansible/plugins/doc_fragments', '/usr/share/ansible/plugins/doc_fragments'] DUPLICATE_YAML_DICT_KEY(default) = warn ENABLE_TASK_DEBUGGER(default) = False ERROR_ON_MISSING_HANDLER(default) = True FACTS_MODULES(default) = ['smart'] GALAXY_IGNORE_CERTS(default) = False GALAXY_ROLE_SKELETON(default) = None GALAXY_ROLE_SKELETON_IGNORE(default) = ['^.git$', '^.*/.git_keep$'] GALAXY_SERVER(default) = https://galaxy.ansible.com GALAXY_SERVER_LIST(default) = None GALAXY_TOKEN(default) = None GALAXY_TOKEN_PATH(default) = /home/awx/.ansible/galaxy_token HOST_KEY_CHECKING(default) = True HOST_PATTERN_MISMATCH(default) = warning INJECT_FACTS_AS_VARS(default) = True INTERPRETER_PYTHON(default) = auto_legacy INTERPRETER_PYTHON_DISTRO_MAP(default) = {'centos': {'6': '/usr/bin/python', '8': '/usr/libexec/platform-python'}, 'fedora': {'23': '/usr/bin/python3'}, 'redhat': {'6': '/usr/bin/python', '8': '/usr/libexec/platform-python'}, 'rhel': {'6': '/usr/bin/python', '8': '/usr/libexec/platform-python'}, 'ubuntu': {'14': '/usr/bin/python', '16': '/usr/bin/python3'}} INTERPRETER_PYTHON_FALLBACK(default) = ['/usr/bin/python', 'python3.7', 'python3.6', 'python3.5', 'python2.7', 'python2.6', '/usr/libexec/platform-python', '/usr/bin/python3', 'python'] INVALID_TASK_ATTRIBUTE_FAILED(default) = True INVENTORY_ANY_UNPARSED_IS_FAILED(default) = False INVENTORY_CACHE_ENABLED(default) = False INVENTORY_CACHE_PLUGIN(default) = None INVENTORY_CACHE_PLUGIN_CONNECTION(default) = None INVENTORY_CACHE_PLUGIN_PREFIX(default) = ansible_facts INVENTORY_CACHE_TIMEOUT(default) = 3600 INVENTORY_ENABLED(default) = ['host_list', 'script', 'auto', 'yaml', 'ini', 'toml'] INVENTORY_EXPORT(default) = False INVENTORY_IGNORE_EXTS(default) = {{(BLACKLIST_EXTS + ( '.orig', '.ini', '.cfg', '.retry'))}} INVENTORY_IGNORE_PATTERNS(default) = [] INVENTORY_UNPARSED_IS_FAILED(default) = False LOCALHOST_WARNING(default) = True MAX_FILE_SIZE_FOR_DIFF(default) = 104448 NETCONF_SSH_CONFIG(default) = None NETWORK_GROUP_MODULES(default) = ['eos', 'nxos', 'ios', 'iosxr', 'junos', 'enos', 'ce', 'vyos', 'sros', 'dellos9', 'dellos10', 'dellos6', 'asa', 'aruba', 'aireos', 'bigip', 'ironware', 'onyx', 'netconf'] OLD_PLUGIN_CACHE_CLEARING(default) = False PARAMIKO_HOST_KEY_AUTO_ADD(default) = False PARAMIKO_LOOK_FOR_KEYS(default) = True PERSISTENT_COMMAND_TIMEOUT(default) = 30 PERSISTENT_CONNECT_RETRY_TIMEOUT(default) = 15 PERSISTENT_CONNECT_TIMEOUT(default) = 30 PERSISTENT_CONTROL_PATH_DIR(default) = /home/awx/.ansible/pc PLAYBOOK_DIR(default) = None PLAYBOOK_VARS_ROOT(default) = top PLUGIN_FILTERS_CFG(default) = None PYTHON_MODULE_RLIMIT_NOFILE(default) = 0 RETRY_FILES_ENABLED(default) = False RETRY_FILES_SAVE_PATH(default) = None SHOW_CUSTOM_STATS(default) = False STRING_CONVERSION_ACTION(default) = warn STRING_TYPE_FILTERS(default) = ['string', 'to_json', 'to_nice_json', 'to_yaml', 'ppretty', 'json'] SYSTEM_WARNINGS(default) = True TAGS_RUN(default) = [] TAGS_SKIP(default) = [] TASK_DEBUGGER_IGNORE_ERRORS(default) = True TRANSFORM_INVALID_GROUP_CHARS(default) = never USE_PERSISTENT_CONNECTIONS(default) = False VARIABLE_PRECEDENCE(default) = ['all_inventory', 'groups_inventory', 'all_plugins_inventory', 'all_plugins_play', 'groups_plugins_inventory', 'groups_plugins_play'] VERBOSE_TO_STDERR(default) = False YAML_FILENAME_EXTENSIONS(default) = ['.yml', '.yaml', '.json']
You know, I wonder if the way you have your RabbitMQ HA setup, if sometimes playbooks are running twice.
I didn’t mess with RabbitMQ at all, and the AWX version I have (9.3.0) seems to be using Redis instead of RabbitMQ, as I see an 'awx-redis' container but not an 'awx-rabbitmq' one.
In other words, you launch a job, and the same ansible-playbook ends up running on two distinct nodes... Do you have any evidence of other duplicate events for this one job?
Same answer as above, i.e. all the job templates that were submitted while both AWX instances were running were impacted. Now, specific to your question, yes, here’s evidence of duplicates for other jobs resulting from launching the same template:
Job #10709
awx=> select created, modified, uuid, parent_uuid, playbook from main_jobevent where job_id=10709 and event='playbook_on_stats'; created | modified | uuid | parent_uuid | playbook -------------------------------+-------------------------------+--------------------------------------+--------------------------------------+----------------- 2020-06-24 13:15:45.685405-05 | 2020-06-24 13:15:45.990309-05 | 0f1699fc-21a5-4a60-bef2-071a22378210 | 5715f4d1-0c31-4c88-b111-f2ac61f242b5 | deploy_main.yml 2020-06-24 13:15:44.150874-05 | 2020-06-24 13:15:44.883841-05 | e0fb8810-8397-4c68-9e3c-744c3f7cd361 | b68b122a-48b1-4927-b557-6fc908c690cc | deploy_main.yml (2 rows)
Job #10733
created | modified | uuid | parent_uuid | playbook -------------------------------+-------------------------------+--------------------------------------+--------------------------------------+----------------- 2020-06-24 14:16:33.882129-05 | 2020-06-24 14:16:34.346917-05 | a9d1c0e3-31f4-4ed0-ba58-f96851760bbb | 8af7af44-2d43-49d4-8788-37e4402169e0 | deploy_main.yml 2020-06-24 14:16:32.485674-05 | 2020-06-24 14:16:33.023548-05 | d8d4c7b2-4f84-427f-a0d8-1635277f3fd8 | 14a68061-294e-41f4-9a79-306fa34e190d | deploy_main.yml (2 rows)
@mjpalacios does this job have any events that represent fact gathering? If so, to confirm this theory, we could see if there are any duplicate fact gathers that have distinct IPs - that would suggest that your one job ran in duplicate on multiple hosts in your cluster.
How do I identify those?
@ryanpetrello Please find below the files comprising the kubeconfig creation playbook (sanitized): This is the playbook corresponding to job 10719 exhibiting the same anomalous behavior as 10709, 10730, 10733
pb-create-kubeconfig.yml
---
- hosts: localhost
tasks:
- name: Bail out when required playbook variables are missing
assert:
that:
- (pdc_token is defined and pdc_token)
- (bdc_token is defined and bdc_token)
- (svc_acct is defined and svc_acct)
- name: Include role for creating kubeconfig
include_role:
name: create-kubeconfig
roles/create-kubeconfig/tasks/main.yml
---
- name: Instantiate kubeconfig template
template:
src: kubeconfig.yml.j2
dest: "{{target_directory}}/{{kubeconfig_file}}"
roles/create-kubeconfig/templates/kubeconfig.yml.j2
---
apiVersion: v1
clusters:
- cluster:
server: https://{{pdc_server}}
name: {{pdc_cluster}}
- cluster:
server: https://{{bdc_server}}
name: {{bdc_cluster}}
contexts:
- context:
cluster: {{pdc_cluster}}
user: {{svc_acct}}/{{pdc_server}}
name: {{pdc_cluster}}
- context:
cluster: {{bdc_cluster}}
user: {{svc_acct}}/{{bdc_server}}
name: {{bdc_cluster}}
kind: Config
preferences: {}
users:
- name: {{svc_acct}}/{{pdc_server}}
user:
token: {{pdc_token}}
- name: {{svc_acct}}/{{bdc_server}}
user:
token: {{bdc_token}}
roles/create-kubeconfig/vars/main.yml
---
target_directory: /tmp
kubeconfig_file: kubeconfig.yml
pdc_server: pdc-app.domain.org:6443
pdc_cluster: app-pdc
bdc_server: bdc-app.domain.org:6443
bdc_cluster: app-bdc
awx=> select created, modified, uuid, parent_uuid, playbook from main_jobevent where job_id=10719 and event='playbook_on_stats';
created | modified | uuid | parent_uuid | playbook
-------------------------------+-------------------------------+--------------------------------------+--------------------------------------+--------------------------
2020-06-24 13:48:13.54507-05 | 2020-06-24 13:48:13.765564-05 | 047d2cb8-cdf8-4e37-862f-99b4b0e8c095 | aed09452-c40a-44e6-aebd-b49c5e528b82 | pb-create-kubeconfig.yml
2020-06-24 13:48:13.534681-05 | 2020-06-24 13:48:13.734054-05 | c16bab3d-c7e5-497b-a822-c9758097ece0 | b9087855-1668-4022-bf5b-ced5c4ac3cc1 | pb-create-kubeconfig.yml
(2 rows)
@mjpalacios,
I didn’t mess with RabbitMQ at all, and the AWX version I have (9.3.0) seems to be using Redis instead of RabbitMQ, as I see an 'awx-redis' container but not an 'awx-rabbitmq' one.
Something has gone terribly, terribly wrong. RabbitMQ wasn't removed until AWX 10.0.0: https://github.com/ansible/awx/blob/devel/CHANGELOG.md#1000-mar-30-2020
Not sure how to proceed from here. Any suggestions?
Are you certain you're actually deploying 9.3.0? It doesn't sound like it to me if you're not seeing RabbitMQ anywhere. This makes me wonder if you've got some sort of disparity between your image and the AWX source code you're deploying with.
If awx-manage --version
reports 9.3.0, but you're not seeing RabbitMQ containers, then you've got something messed up in your deployment, and it's possible it's causing whatever weirdness you're seeing here.
@ryanpetrello, found that the code version is 10.0.0 but version label was not updated and still had 9.3.0. I'm going to download the latest version and work the way up from dev to prod. This is going to take some time.
Could be related to https://github.com/ansible/awx/issues/7253?
I have built docker images and deployed AWX 13.0.0 into a non-production OpenShift environment. Will deploy into a production environment (single instance) later this week. Will update this issue after that.
@mjpalacios I'm still not totally sure what's up here, but I've seen a few similar reports and would like to track all of this one one issue to avoid duplicates:
ISSUE TYPE
SUMMARY
I have installed two instances of AWX into separate data centers for HA purposes and put an F5 GTM in front. The GTM is configured active/active. Both instances are using the same database. Issue is, when both instances of AWX are running simultaneously, job submissions succeed, but job execution log retrievals fail. AWX displays an error message "Invalid search term entered. GET returned: 500 A server error has occurred." This happens even when one instance is disabled in the GTM. The only way to prevent it from happening is to leave only one instance running and fully shutdown the other.
ENVIRONMENT
STEPS TO REPRODUCE
EXPECTED RESULTS
Job execution details
ACTUAL RESULTS
Window pops up with message:
ADDITIONAL INFORMATION