Open Peter1295 opened 5 months ago
Random crashes with message The running ansible process received a shutdown signal.
where are u seeing this? please provide some context
currently we do not have enough information to understand what is happening here.
AWX Template fails with that message. Time is random, mostly between 7-12mins of job running, I can see it happens with a jobs what are doing changes on multiple hosts (patching, VM customization etc.).
Unfortunately awx-task logs do not show anything helpful, just a message job/workflow failed.
Workflow job 18542 failed due to reason: No error handling path for workflow job node(s) [(26156,failed)]. Workflow job node(s) missing unified job template and error handling path [].
Cluster is running on k8s with setup of 2 Control planes and 4 worker nodes, where maximum CPU and Memory usage based on command kubectl top node is around 20%/80% (CPU/MEM) and all nodes have at least 40% free disk space.
AWX database is running on external Postgres server.
Attaching logs from automation pod what failed in the middle. Absolutely no info what is happening, not from awx-task, awx-web nor automation pod. Any suggestion what to look for? task3.log
AWX is really needed for us, we are using it for managing, deploying, patching etc. on daily basis, it is running at least 50 templates a day and I cannot be permanently connected on it to check if it's still working. We have another instance on production environment, where we still run 23.3.1 what is running properly, but unfortunately downgrade is not working anymore, it cannot use upgraded database.
Another update, issue is not version related, I was able to downgrade AWX with to version 23.8.1 (what should not have such problems). Issue is not even with database, where I used both actual and older postgres from before migration. Sometimes it fails in 5min, sometimes job run for almost 1h.
Issue persist also on 24.6.0. Kubernetes logs shows only info about successful shutdown of automation pod, not what and why it is happening.
Please confirm the following
security@ansible.com
instead.)Bug Summary
Random crashes with message The running ansible process received a shutdown signal. After the crash awx-task pod what was running job disappear from the Instances but pod is still running in the cluster.
Attaching logs from ArgoCD awx-task.txt
AWX version
AWX 24.4.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
v2.17.0
Operating system
k8s cluster on OL9
Web browser
Firefox, Chrome, Edge
Steps to reproduce
Created playbook with 6 5min pause commands and run template.
Expected results
Finish template in 30mins.
Actual results
Failed within 10 minutes with error The running ansible process received a shutdown signal.
Additional information
Issue behaves the same like on #14948 but that should be resolved by version 23.8.1 and I am using newest version of AWX.