Open 2and3makes23 opened 1 year ago
Hello, When jobs are relaunched, they still have to go through the task manager processing. They can get assigned to different nodes within the same instance group, and the expectation is that instances will need to be enabled and available so that the job can run.
May we ask why you are disabling the instances for a given job template? We would like to gain a better understanding of this particular use case.
Thank you for your time!
Hi, thanks for your quick response!
We disable all instances (not just for a particular job template, but in general) when updating to a more recent AWX version in order not to interrupt our customers jobs in the process.
Jobs that are triggered during our update process are enqueued ("Pending" state) and executed after reenabling the instances (in our case: complete AWX redeployment).
Only relaunched jobs run into above described error, while instances are disabled.
Updating for us means, updating the AWX operator and redeploying AWX using that newer Operator, which we trigger explicitly because of staging.
Of course we would much rather update in a more kubernetes way and use a rolling update strategy (replacing old pods one by one) instead of disabling and redeploying, but as far as we know, that is not yet possible: awx-operator/issues/1275 and awx-operator/issues/1362
But maybe you have some helpful input on that for us, too? :) Thank you for your time :)
@2and3makes23 Thank you so much for providing this additional information! This is extremely helpful. Could you please also provide us with the trace-back logs that are generated when this occurs? This will be very helpful to us.
Thank you again for taking the time to provide all of this information!
I had a look and I was not able to reproduce this issue. Jobs can be relaunched even when all instances are disabled and the relaunch job goes into "pending" as expected.
Sorry for the delay
@AlanCoding thanks for checking on your side
@djyasin please find log output below that is produced for one event of a user clicking job relaunch while all (two) instances are disabled
September 8th 2023, 15:37:33.11<some_ip> - - [08/Sep/2023 13:37:33] "GET /probe?seconds=1&livereadistart=readi HTTP/1.1" 200 -
September 8th 2023, 15:37:33.11<some_ip> - - [08/Sep/2023:13:37:33 +0000] "GET / HTTP/1.1" 200 8 "-" "python-requests/2.31.0"
September 8th 2023, 15:37:33.11<some_ip> - - [08/Sep/2023:13:37:33 +0000] "GET / HTTP/1.1" 200 8 "-" "python-requests/2.31.0"
September 8th 2023, 15:37:31.232[pid: 40|app: 0|req: 144/280] <some_ip> () {70 vars in 1261 bytes} [Fri Sep 8 13:37:30 2023] POST /api/v2/jobs/43006/relaunch/ => generated 41 bytes in 301 msecs (HTTP/1.1 500) 8 headers in 309 bytes (1 switches on core 0)
September 8th 2023, 15:37:31.23<some_ip> - - [08/Sep/2023:13:37:31 +0000] "POST /api/v2/jobs/43006/relaunch/ HTTP/1.1" 500 41 "https://awx.domain.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0" "<some_ip>, <some_ip>"
September 8th 2023, 15:37:31.2292023-09-08 13:37:31,226 ERROR [7c40e4bfc1ea472aa957f6662601b473] django.request Internal Server Error: /api/v2/jobs/43006/relaunch/
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/views/generic/base.py", line 104, in view
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/generics.py", line 332, in dispatch
September 8th 2023, 15:37:31.229 return super(APIView, self).dispatch(request, *args, **kwargs)
September 8th 2023, 15:37:31.229 raise exc
September 8th 2023, 15:37:31.229 new_job = obj.copy_unified_job(**copy_kwargs)
September 8th 2023, 15:37:31.229 unified_job = self.unified_job_template.create_unified_job(**prompts)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 906, in save
September 8th 2023, 15:37:31.229 result = super(UnifiedJob, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/polymorphic/models.py", line 87, in save
September 8th 2023, 15:37:31.229 return super().save(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 207, in save
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 814, in save
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 892, in save_base
September 8th 2023, 15:37:31.229 post_save.send(
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/dispatch/dispatcher.py", line 176, in send
September 8th 2023, 15:37:31.229 return [
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/signals.py", line 109, in emit_update_inventory_on_created_or_deleted
September 8th 2023, 15:37:31.229 connection.on_commit(lambda: update_inventory_computed_fields.delay(inventory.id))
September 8th 2023, 15:37:31.229 connection.on_commit(lambda: update_inventory_computed_fields.delay(inventory.id))
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/publish.py", line 73, in delay
September 8th 2023, 15:37:31.229 queue = queue()
September 8th 2023, 15:37:31.229 response = get_response(request)
September 8th 2023, 15:37:31.229 return super(APIView, self).dispatch(request, *args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/views/__init__.py", line 3424, in post
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 940, in copy_unified_job
September 8th 2023, 15:37:31.229 unified_job = self.unified_job_template.create_unified_job(**prompts)
September 8th 2023, 15:37:31.229 unified_job.save()
September 8th 2023, 15:37:31.229 return super().save(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 325, in save
September 8th 2023, 15:37:31.229 super(PrimordialModel, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 return [
September 8th 2023, 15:37:31.229 connection.on_commit(lambda: update_inventory_computed_fields.delay(inventory.id))
September 8th 2023, 15:37:31.229 return cls.apply_async(args, kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/publish.py", line 93, in apply_async
September 8th 2023, 15:37:31.229 queue = queue()
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/__init__.py", line 37, in get_task_queuename
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
September 8th 2023, 15:37:31.229 return self.dispatch(request, *args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
September 8th 2023, 15:37:31.229 self.raise_uncaught_exception(exc)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/views/__init__.py", line 3424, in post
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 940, in copy_unified_job
September 8th 2023, 15:37:31.229 job = super(JobTemplate, self).create_unified_job(**kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 400, in create_unified_job
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 325, in save
September 8th 2023, 15:37:31.229 super(PrimordialModel, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 173, in save
September 8th 2023, 15:37:31.229 self.save_base(
September 8th 2023, 15:37:31.229 (receiver, receiver(signal=self, sender=sender, **named))
September 8th 2023, 15:37:31.229 func()
September 8th 2023, 15:37:31.229 return cls.apply_async(args, kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/publish.py", line 93, in apply_async
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/__init__.py", line 37, in get_task_queuename
September 8th 2023, 15:37:31.229 raise ValueError('No task instances are READY and Enabled.')
September 8th 2023, 15:37:31.2292023-09-08 13:37:31,226 ERROR [7c40e4bfc1ea472aa957f6662601b473] django.request Internal Server Error: /api/v2/jobs/43006/relaunch/
September 8th 2023, 15:37:31.229Traceback (most recent call last):
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/handlers/exception.py", line 55, in inner
September 8th 2023, 15:37:31.229 return view_func(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/views/generic/base.py", line 104, in view
September 8th 2023, 15:37:31.229 return super(JobRelaunch, self).dispatch(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
September 8th 2023, 15:37:31.229 response = handler(request, *args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/jobs.py", line 655, in copy_unified_job
September 8th 2023, 15:37:31.229 job = super(JobTemplate, self).create_unified_job(**kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 906, in save
September 8th 2023, 15:37:31.229 result = super(UnifiedJob, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/polymorphic/models.py", line 87, in save
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 207, in save
September 8th 2023, 15:37:31.229 super(PasswordFieldsModel, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 814, in save
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
September 8th 2023, 15:37:31.229 (receiver, receiver(signal=self, sender=sender, **named))
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/signals.py", line 109, in emit_update_inventory_on_created_or_deleted
September 8th 2023, 15:37:31.229 connection.on_commit(lambda: update_inventory_computed_fields.delay(inventory.id))
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/signals.py", line 109, in <lambda>
September 8th 2023, 15:37:31.229 raise ValueError('No task instances are READY and Enabled.')
September 8th 2023, 15:37:31.229ValueError: No task instances are READY and Enabled.
September 8th 2023, 15:37:31.229Traceback (most recent call last):
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/handlers/exception.py", line 55, in inner
September 8th 2023, 15:37:31.229 response = get_response(request)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/handlers/base.py", line 197, in _get_response
September 8th 2023, 15:37:31.229 response = wrapped_callback(request, *callback_args, **callback_kwargs)
September 8th 2023, 15:37:31.229 return view_func(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/views/__init__.py", line 3377, in dispatch
September 8th 2023, 15:37:31.229 return super(JobRelaunch, self).dispatch(*args, **kwargs)
September 8th 2023, 15:37:31.229 response = self.handle_exception(exc)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
September 8th 2023, 15:37:31.229 response = handler(request, *args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/jobs.py", line 655, in copy_unified_job
September 8th 2023, 15:37:31.229 return super(Job, self).copy_unified_job(**new_prompts)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/jobs.py", line 393, in create_unified_job
September 8th 2023, 15:37:31.229 unified_job.save()
September 8th 2023, 15:37:31.229 super(PasswordFieldsModel, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 super(CreatedModifiedModel, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 760, in on_commit
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/signals.py", line 109, in <lambda>
September 8th 2023, 15:37:31.229ValueError: No task instances are READY and Enabled.
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/handlers/base.py", line 197, in _get_response
September 8th 2023, 15:37:31.229 response = wrapped_callback(request, *callback_args, **callback_kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
September 8th 2023, 15:37:31.229 return self.dispatch(request, *args, **kwargs)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/views/__init__.py", line 3377, in dispatch
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/generics.py", line 332, in dispatch
September 8th 2023, 15:37:31.229 response = self.handle_exception(exc)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
September 8th 2023, 15:37:31.229 self.raise_uncaught_exception(exc)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
September 8th 2023, 15:37:31.229 raise exc
September 8th 2023, 15:37:31.229 new_job = obj.copy_unified_job(**copy_kwargs)
September 8th 2023, 15:37:31.229 return super(Job, self).copy_unified_job(**new_prompts)
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/jobs.py", line 393, in create_unified_job
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/unified_jobs.py", line 400, in create_unified_job
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/models/base.py", line 173, in save
September 8th 2023, 15:37:31.229 super(CreatedModifiedModel, self).save(*args, **kwargs)
September 8th 2023, 15:37:31.229 self.save_base(
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/models/base.py", line 892, in save_base
September 8th 2023, 15:37:31.229 post_save.send(
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/dispatch/dispatcher.py", line 176, in send
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/base/base.py", line 760, in on_commit
September 8th 2023, 15:37:31.229 func()
September 8th 2023, 15:37:31.229 File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/dispatch/publish.py", line 73, in delay
September 8th 2023, 15:37:30.898[pid: 38|app: 0|req: 68/279] <some_ip> () {64 vars in 1140 bytes} [Fri Sep 8 13:37:30 2023] GET /api/v2/jobs/43006/relaunch/ => generated 68 bytes in 266 msecs (HTTP/1.1 200) 14 headers in 583 bytes (1 switches on core 0)
September 8th 2023, 15:37:30.89<some_ip> - - [08/Sep/2023:13:37:30 +0000] "GET /api/v2/jobs/43006/relaunch/ HTTP/1.1" 200 68 "https://awx.domain.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0" "<some_ip>, <some_ip>"
Thanks, that points to some relatively recent code so this is good information.
I didn't give enough information in my last comment - the ValueError is hit because we have enabled=True
as a part of the instance filter, so the queryset returns no instances, and raises that error. The obvious and simple fix is to either remove that from the filter, or add a last-ditch query to get disabled instances when no enabled instances are present.
I did not hit this bug in my replication attempt because I was using a hybrid node, which submits tasks locally. Only web pods use this code.
This is obviously valid and should get worked on.
Thanks for looking into this, we really appreciate it ❤️
Please confirm the following
security@ansible.com
instead.)Bug Summary
When all AWX instances are disabled and a former job gets relaunched the following things happen
AWX version
22.5.0
Select the relevant components
Installation method
openshift
Modifications
no
Ansible version
2.12.10
Operating system
CentOS, RHEL
Web browser
Firefox
Steps to reproduce
Expected results
The relaunched job appears under jobs as pending and begins to start as soon as an AWX instance gets reenabled and picks it up
Actual results
internal server error
the job appears in jobs wit status "New" and is stuck there indefinetly
Reenabling instances does not change the state of the relaunched job.
Additional information
After reenabling AWX instances everything works fine again, including job relaunch. Only the "New" job stays stuck.