Open lemmy04 opened 3 years ago
I see this happening as well, except my playbooks successfully start and complete and the k8s pod terminates cleanly, but the UI still shows the job in the Running state.
Hi - can you get the logs from the awx-task container related to the launch of the job? We'd need more information to determine what's going on here.
... I haven't touched kubernetes before at all, until AWX 18 came out... what exactly do you need?
2021-05-15 06:03:29,735 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.dispatch task ac491d10-dfb9-4384-a0f7-ff0bed326388 starting awx.main.tasks.cluster_node_heartbeat(*[])
2021-05-15 06:03:29,738 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.tasks Cluster node heartbeat task.
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
2021-05-15 06:03:29,944 WARNING [7d8140cba46b442caf544b74ee77a3e5] awx.main.dispatch scaling up worker pid:4637
2021-05-15 06:03:29,930 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.dispatch task 069050b2-4f4a-4a40-ae4d-22e3d12e8692 starting awx.main.tasks.awx_periodic_scheduler(*[])
2021-05-15 06:03:29,929 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.dispatch task 61ef1476-a7b5-4d41-abba-e4406d2f1280 starting awx.main.scheduler.tasks.run_task_manager(*[])
2021-05-15 06:03:29,959 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.scheduler Running task manager.
2021-05-15 06:03:29,932 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.dispatch task b2f8cbd1-f0cb-4bfc-832f-774099510166 starting awx.main.tasks.awx_k8s_reaper(*[])
2021-05-15 06:03:30,028 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.dispatch task 5dc51ed6-7f26-4ea4-84fb-1967617ad64a starting awx.main.analytics.analytics_tasks.send_subsystem_metrics(*[])
2021-05-15 06:03:30,053 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.tasks Starting periodic scheduler
2021-05-15 06:03:30,064 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.tasks Last scheduler run was: 2021-05-15 06:02:59.623461+00:00
2021-05-15 06:03:30,134 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.scheduler Starting Scheduler
2021-05-15 06:03:30,225 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.tasks Checking for orphaned k8s pods for tower-1.
2021-05-15 06:03:30,327 DEBUG [7d8140cba46b442caf544b74ee77a3e5] awx.main.scheduler Finishing Scheduler
2021-05-15 06:03:32,315 WARNING [a7372df73eb140f4b35634cc1de6f3d2] awx.main.dispatch scaling down worker pid:4628
2021-05-15 06:03:32,316 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.main.dispatch task 40e94a9d-513a-45dc-8cb1-f822ceed040d starting awx.main.scheduler.tasks.run_task_manager(*[])
2021-05-15 06:03:32,318 WARNING [7d8140cba46b442caf544b74ee77a3e5] awx.main.dispatch worker exiting gracefully pid:4628
2021-05-15 06:03:32,329 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.main.scheduler Running task manager.
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
2021-05-15 06:03:32,364 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.main.scheduler Starting Scheduler
2021-05-15 06:03:32,640 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.analytics.job_lifecycle projectupdate-10030 acknowledged
2021-05-15 06:03:32,690 WARNING [a7372df73eb140f4b35634cc1de6f3d2] awx.main.scheduler No available capacity to run containerized <project_update 10030 (waiting)>.
2021-05-15 06:03:32,830 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.analytics.job_lifecycle projectupdate-10030 waiting
2021-05-15 06:03:32,831 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.main.scheduler project_update 10030 (waiting) consumed 0 capacity units from tower with prior total of 0
2021-05-15 06:03:32,832 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.main.scheduler Finishing Scheduler
2021-05-15 06:03:32,894 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.main.dispatch task 0c40df72-e14f-47fc-9f4c-79c6aea5503a starting awx.main.tasks.RunProjectUpdate(*[10030])
2021-05-15 06:03:33,561 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.analytics.job_lifecycle projectupdate-10030 pre run
2021-05-15 06:03:33,581 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.analytics.job_lifecycle projectupdate-10030 preparing playbook
2021-05-15 06:03:33,725 DEBUG [a7372df73eb140f4b35634cc1de6f3d2] awx.analytics.job_lifecycle projectupdate-10030 running playbook
2021-05-15 06:03:42,842 DEBUG [-] awx.main.commands.run_callback_receiver 27 is alive
2021-05-15 06:03:42,842 DEBUG [-] awx.main.commands.run_callback_receiver 27 is alive
2021-05-15 06:03:49,893 DEBUG [48f472980b8f4f2fa2dae02f4cd9e3dd] awx.main.dispatch task c315fc3a-bc0b-45bf-ab15-a733c305fc59 starting awx.main.scheduler.tasks.run_task_manager(*[])
2021-05-15 06:03:49,895 DEBUG [48f472980b8f4f2fa2dae02f4cd9e3dd] awx.main.scheduler Running task manager.
2021-05-15 06:03:49,985 DEBUG [48f472980b8f4f2fa2dae02f4cd9e3dd] awx.main.dispatch task 93290266-03db-4e82-8149-cfeabb045a4d starting awx.main.analytics.analytics_tasks.send_subsystem_metrics(*[])
2021-05-15 06:03:50,020 DEBUG [48f472980b8f4f2fa2dae02f4cd9e3dd] awx.main.scheduler Starting Scheduler
2021-05-15 06:03:50,247 DEBUG [48f472980b8f4f2fa2dae02f4cd9e3dd] awx.main.scheduler Finishing Scheduler
2021-05-15 06:04:00,015 DEBUG [4d60fdea098b4adc8ddd989ee154d632] awx.main.dispatch task 32e4017f-f5cf-4ac8-8e7a-af2f14c32beb starting awx.main.tasks.awx_periodic_scheduler(*[])
2021-05-15 06:04:00,046 DEBUG [4d60fdea098b4adc8ddd989ee154d632] awx.main.tasks Starting periodic scheduler
RESULT 2
OKREADY
2021-05-15 06:04:00,055 DEBUG [4d60fdea098b4adc8ddd989ee154d632] awx.main.tasks Last scheduler run was: 2021-05-15 06:03:30.054996+00:00
2021-05-15 06:04:10,048 DEBUG [82e2362b5b064dfca43b4eb30c004fc8] awx.main.dispatch task 2f5448fb-6d28-4935-b776-0866084fce7d starting awx.main.scheduler.tasks.run_task_manager(*[])
2021-05-15 06:04:10,052 DEBUG [82e2362b5b064dfca43b4eb30c004fc8] awx.main.scheduler Running task manager.
2021-05-15 06:04:10,127 DEBUG [82e2362b5b064dfca43b4eb30c004fc8] awx.main.scheduler Starting Scheduler
2021-05-15 06:04:10,219 DEBUG [82e2362b5b064dfca43b4eb30c004fc8] awx.main.dispatch task ee9db357-b00a-4056-ac63-a34c9a89ce68 starting awx.main.analytics.analytics_tasks.send_subsystem_metrics(*[])
2021-05-15 06:04:10,424 DEBUG [82e2362b5b064dfca43b4eb30c004fc8] awx.main.scheduler Finishing Scheduler
2021-05-15 06:04:30,227 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.dispatch task 4712463f-c29f-4fc2-9f1b-f52c28dabe8d starting awx.main.tasks.cluster_node_heartbeat(*[])
2021-05-15 06:04:30,230 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.tasks Cluster node heartbeat task.
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
Instance Group already registered tower
(changed: True)
2021-05-15 06:04:30,431 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.dispatch task dd11c919-b9bb-488e-9fa7-e4e2a28d1cc1 starting awx.main.tasks.awx_k8s_reaper(*[])
2021-05-15 06:04:30,442 WARNING [780f73b06160467781e0d4b02f1463d9] awx.main.dispatch scaling up worker pid:4644
2021-05-15 06:04:30,431 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.dispatch task fb8d52f3-256d-49d4-9abd-709362f63fbb starting awx.main.tasks.awx_periodic_scheduler(*[])
2021-05-15 06:04:30,531 WARNING [780f73b06160467781e0d4b02f1463d9] awx.main.dispatch scaling up worker pid:4646
2021-05-15 06:04:30,532 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.dispatch task c8a824dd-97c4-4d21-b330-c37b1a631f7e starting awx.main.scheduler.tasks.run_task_manager(*[])
2021-05-15 06:04:30,543 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.dispatch task 18217501-f7aa-4e35-90de-e49c14d71140 starting awx.main.analytics.analytics_tasks.send_subsystem_metrics(*[])
2021-05-15 06:04:30,552 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.scheduler Running task manager.
2021-05-15 06:04:30,630 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.tasks Starting periodic scheduler
2021-05-15 06:04:30,648 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.tasks Checking for orphaned k8s pods for tower-1.
2021-05-15 06:04:30,649 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.tasks Last scheduler run was: 2021-05-15 06:04:00.052460+00:00
2021-05-15 06:04:30,741 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.scheduler Starting Scheduler
2021-05-15 06:04:31,229 DEBUG [780f73b06160467781e0d4b02f1463d9] awx.main.scheduler Finishing Scheduler
kubectl -n awx logs -f --
I think I had a similar problem with SCM updates, does your project contain a requirements.yml
file? and if so did you create an Ansible Galaxy/Automation Hub API Token
credential and assign it to your organization?
No requirements.yml files anywhere in my projects, and my awx does have the one galaxy token that gets created and assigned during deployment.
I just deleted my AWXS object on my k3s, and redeployed - same issue: scm jobs don't start at all.
Is there anything helpful in the logs of the in-pod awx-ee container?
not really:
INFO 2021/05/21 18:03:22 Client connected to control service INFO 2021/05/21 18:03:22 Control service closed INFO 2021/05/21 18:03:22 Client disconnected from control service INFO 2021/05/21 18:03:22 Client connected to control service DEBUG 2021/05/21 18:03:24 Sending service advertisement: &{awx-9b7494dfb-jgk9m control 2021-05-21 18:03:24.48428354 +0000 UTC m=+38108.977573786 1 map[] []}
that is literally all there is.
I have deleted the whole namespace in kubernetes and deleted the operator and deployed from scratch - and it's still the same: project sync jobs do not start at all, and there is nothing in the logs that would explain why.
I've actually found something in the logs:
2021-05-28 11:25:17,565 WARNING [6478de4b690b4854bb03466b65b1b70d] awx.main.scheduler No available capacity to run containerized <project_update 10116 (waiting)>.
2021-05-28 11:25:17,720 DEBUG [6478de4b690b4854bb03466b65b1b70d] awx.analytics.job_lifecycle projectupdate-10116 waiting
"No available capacity to run ..." - where does that come from, and why does it affect only SCM updates?
I've reconfigured my AWX to have much bigger resource limits but I still get this message:
2021-06-01 06:05:35,058 WARNING [d03db904726b4c5abc80773ea8897714] awx.main.scheduler No available capacity to run containerized <project_update 10127 (waiting)>.
and the scm update is not happening.
there is also a file not found error on awx-ee:
ERROR 2021/06/01 06:05:17 Error updating status file /tmp/receptor/awx-5f9c75f6b6-ks7jz/K72xh22I/status: open /tmp/receptor/awx-5f9c75f6b6-ks7jz/K72xh22I/status.lock: no such file or directory.
When I look inside the actual container I find that there is no file by that name.
instead there is /tmp/receptor/awx-5f9c75f6b6-ks7jz/Zg7PbqHR/status
This sounds like it could be the same issue that some of us are having in issue #10489
looks pretty much like the same thing. Can't really verify though - I've given up on AWX > 17.1 for now, and since the deployment with database migration is also broken I have no way to spin up a latest version AWX in my kubernetes to get enough history for it to trigger.
Hey @lemmy04, I have actually been bypassing the database migration workflow and manually dumping the database and then restoring it. I've added some steps below. I hope this helps!
On old server: stop all containers except postgres first sudo docker exec awx_postgres pg_dump -U DATABASEUSERNAME -Ft -f backup.tar awx sudo docker cp awx_postgres:db.tar ./ sudo docker exec awx_postgres rm db.tar Send the tarball to the new server and then delete the copy that was just created on the old server
On new server after deploying a bare install of AWX: (make sure you run the below commands against the appropriate namespace) kubectl scale --replicas=0 deployment awx kubectl scale --replicas=0 deployment awx-operator kubectl exec -it awx-postgres-0 -- /bin/bash psql -U DATABASEUSERNAME postgres DROP DATABASE awx; CREATE DATABASE awx; exit exit kubectl cp db.tar awx/awx-postgres-0:/ kubectl exec -it awx-postgres-0 -- /bin/bash pg_restore -U DATABASEUSERNAME -Ft backup.tar -d awx rm backup.tar exit rm backup.tar kubectl scale --replicas=1 deployment awx-operator kubectl scale --replicas=1 deployment awx
i also have a project update stuck at running the second time already. it is bad, because it also blocks other jobs (they are on pending) using awx 19.3.0
Identity added: /tmp/pdd_wrapper_7925_uf780rcs/awx_7925_8hqi355q/artifacts/7925/ssh_key_data (ansible@vpadm002) PLAY [Update source tree if necessary] * TASK [update project using git] **** ok: [localhost] TASK [Set the git repository version] ** ok: [localhost] TASK [Repository Version] ** ok: [localhost] => { "msg": "Repository Version 6d46b547c250e7595c8c5f62c91a273eccb888c7" } PLAY [Install content with ansible-galaxy command if necessary] **** TASK [Check content sync settings] *****
I have the same issue on 19.4.0 where project updates jobs get stuck into "Check content sync settings". It happens randomly. All scheduled jobs are failing because the projects can't be updated. This is a fresh install using the official AWX operator.
Same here
INFO2021/12/26 06:22:48 Client connected to control service
INFO2021/12/26 06:22:48 Control service closed
INFO2021/12/26 06:22:48 Client disconnected from control service
ERROR2021/12/26 06:22:49 Error updating status file /tmp/receptor/awx-5b4cff67f5-mh552/9LQAkSuW/status: open /tmp/receptor/awx-5b4cff67f5-mh552/9LQAkSuW/status.lock: no such file or directory.
ERROR2021/12/26 06:22:49 Error creating pod: context canceled
Though I don't get the Error creating pod
part (what pod?), but the Error updating status file
is common.
Are folks still seeing this?
I've seen it last time on the 12th of April, shortly after the AWX upgrade to 20.1.0. It used to fail more often before, one thing I noticed was the issue gets a bit more stable if I delete the AWX pods.
I have a feeling it's better in 20.1.0, I can report back if I still see it.
We are currently seeing this in two instances after upgrading from 21.4 to 21.7. We tried to cancel it, and also redeployed the whole deployment. It still always appears.
Canceling it produces this output in the web container:
[awx-77dcb9477f-2r2vf awx-web] xx.xxx.xx.x - - [14/Oct/2022:09:46:44 +0000] "POST /api/v2/workflow_jobs/7136/cancel/ HTTP/1.1" 202 0 "https://awxurl.de/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0" "xx.xxx.x.xxx, xx.xxx.xx.xxx"
[awx-77dcb9477f-2r2vf awx-web] [pid: 35|app: 0|req: 253/382] xx.xxx.xx.x () {68 vars in 2074 bytes} [Fri Oct 14 09:46:44 2022] POST /api/v2/workflow_jobs/7136/cancel/ => generated 0 bytes in 265 msecs (HTTP/1.1 202) 13 headers in 552 bytes (1 switches on core 0)
ee-container gives (same output as ee-container of all other pods):
[awx-77dcb9477f-g5h86 awx-ee] DEBUG 2022/10/14 09:46:46 Client connected to control service @
[awx-77dcb9477f-g5h86 awx-ee] DEBUG 2022/10/14 09:46:46 Client connected to control service @
[awx-77dcb9477f-g5h86 awx-ee] DEBUG 2022/10/14 09:46:46 Control service closed
[awx-77dcb9477f-g5h86 awx-ee] DEBUG 2022/10/14 09:46:46 Client disconnected from control service @
[awx-77dcb9477f-g5h86 awx-ee] DEBUG 2022/10/14 09:46:46 Control service closed
[awx-77dcb9477f-g5h86 awx-ee] DEBUG 2022/10/14 09:46:46 Client disconnected from control service @
[awx-77dcb9477f-g5h86 awx-ee] DEBUG 2022/10/14 09:46:55 Sending service advertisement: &{awx-77dcb9477f-g5h86 control 2022-10-14 09:46:55.947685586 +0000 UTC m=+63426.817330184 1 map[type:Control Service] [{local false} {kubernetes-runtime-auth false} {kubernetes-incluster-auth false}]}
Manually trying to POST /api/v2/workflow_jobs/7136/cancel/ produces the same output as seen in the web container above.
Trying to delete it produces:
[awx-77dcb9477f-2r2vf awx-web] 2022-10-14 09:57:11,526 WARNING [b41cc4b4803647f1bbfdf2df5ffa0784] awx.api.generics status 403 received by user foo.bar attempting to access /api/v2/workflow_jobs/7136/ from xx.xxx.xx.x
[awx-77dcb9477f-2r2vf awx-web] 2022-10-14 09:57:11,562 WARNING [b41cc4b4803647f1bbfdf2df5ffa0784] django.request Forbidden: /api/v2/workflow_jobs/7136/
[awx-77dcb9477f-2r2vf awx-web] 2022-10-14 09:57:11,562 WARNING [b41cc4b4803647f1bbfdf2df5ffa0784] django.request Forbidden: /api/v2/workflow_jobs/7136/
[awx-77dcb9477f-2r2vf awx-web] xx.xxx.xx.x - - [14/Oct/2022:09:57:11 +0000] "DELETE /api/v2/workflow_jobs/7136/ HTTP/1.1" 403 9343 "https://awxurl.de/api/v2/workflow_jobs/7136/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0" "xx.xxx.x.xxx, xx.xxx.xx.xxx"
[awx-77dcb9477f-2r2vf awx-web] [pid: 36|app: 0|req: 21/390] xx.xxx.xx.x () {70 vars in 2207 bytes} [Fri Oct 14 09:57:11 2022] DELETE /api/v2/workflow_jobs/7136/ => generated 9343 bytes in 212 msecs (HTTP/1.1 403) 15 headers in 709 bytes (1 switches on core 0)
In both cases the last job within the workflow shows the status: Error in awx. (Looking at status via /api/v2/jobs/7141/)
"status": "error",
It also shows the following result traceback:
"result_traceback": "Traceback (most recent call last):
File \"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/receptor.py\", line 372, in _run_internal
resultsock = receptor_ctl.get_work_results(self.unit_id, return_sockfile=True)
File \"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/receptorctl/socket_interface.py\", line 248, in get_work_results
self.writestr(f\"work results {unit_id}\
\")
File \"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/receptorctl/socket_interface.py\", line 49, in writestr
self._sockfile.flush()
File \"/usr/lib64/python3.9/socket.py\", line 722, in write
return self._sock.send(b)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py\", line 537, in run
res = receptor_job.run()
File \"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/receptor.py\", line 268, in run
res = self._run_internal(receptor_ctl)
File \"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/receptor.py\", line 382, in _run_internal
raise RuntimeError(detail)
RuntimeError
",
but the kubernetes job logs say the job has been canceled.
October 10th 2022, 11:05:35.922 {"status": "canceled", "runner_ident": "7141"} automation-job-7141-77cp7
Also here is the log of the pod at the end:
October 10th 2022, 11:01:12.2352022-10-10,230 INFO awx.main.commands.run_callback_receiver Starting EOF event processing for Job 7141
October 10th 2022, 11:01:12.2352022-10-10,230 INFO awx.main.commands.run_callback_receiver Starting EOF event processing for Job 7141
October 10th 2022, 11:01:12.4792022-10-10,477 DEBUG awx.main.tasks.jobs job 7141 (running) finished running, producing 24 events.
October 10th 2022, 11:01:12.4792022-10-10,478 DEBUG awx.analytics.job_lifecycle job-7141 post run
October 10th 2022, 11:01:12.4792022-10-10,477 ERROR awx.main.tasks.jobs job 7141 (running) Exception occurred while running task
October 10th 2022, 11:01:12.7032022-10-10,697 DEBUG awx.analytics.job_lifecycle job-7141 finalize run
October 10th 2022, 11:01:12.7712022-10-10,737 ERROR awx.main.models.unified_jobs job 7141 (error) failed to emit channel msg about status change
October 10th 2022, 11:01:12.7832022-10-10,782 DEBUG awx.main.tasks.system Executing error task id a492b12a-42d4-480a-95ce-0e3f1b5f7475, subtasks: [{'type': 'job', 'id': 7141}]
October 10th 2022, 11:01:12.7832022-10-10,775 WARNING awx.main.dispatch job 7141 (error) encountered an error (rc=None), please see task stdout for details.
Hi, if anyone is currently having the task error with the executing bug and is using awx operator, this may help resolve the error.
I installed awx operator with ks3, in a debian 12 virtual machine, the bug was generated when the virtual machine had assigned 1 cpu core, I still do not have very clear why the error is generated, but when I assigned 2 cpu cores awx tasks worked correctly without any problem, I hope it is helpful.
ISSUE TYPE
SUMMARY
Since yesterday my AWX does not work anymore - jobs get created and started but do not actually start i.e. no worker container is created, no actual work is done
ENVIRONMENT
STEPS TO REPRODUCE
I start a new job by manually launching either an SCM update or a job or workflow template
EXPECTED RESULTS
I would expect the job to either work or fail at some point with some (meaningful) error message
ACTUAL RESULTS
job never actually starts - just sits there in "running" but nothing is actually happening.
ADDITIONAL INFORMATION
no errors on the K3s side of things - it looks more like AWX never actually even tries to talk to K3s anymore. BUT: ad-hoc jobs still work... this only affects templates and SCM updates (or maybe more, i have not tried every possible thing)