Closed vibinm closed 4 years ago
How did you scale to additional replicas? Stateful sets or something else?
Hi,
I scaled it as stateful sets and happy to say that the issue is fixed.
The issue was due to failure of RMQ clustering (due to SELINUX restrictions on the container).
Will close this issue.
ISSUE TYPE
Job continuosly fails on additional replica/instance of AWX in Kubernetes Cluster
SUMMARY
Jobs scheduled/manually ran on the second AWX instance (awx-1) (replica) hosted on the Kubernetes cluster fails continuosly.
ENVIRONMENT
STEPS TO REPRODUCE
Deploy AWX on kubernetes cluster and scale to two or more replicas.
Now try scheduling jobs on the scaled instance, for example awx-1
EXPECTED RESULTS
Jobs are expected run scheduled and run successfully.
ACTUAL RESULTS
ADDITIONAL INFORMATION
Jobs fail with this message in the web,
Task was marked as running in Tower but was not present in the job queue, so it has been marked as failed.
Nd in the task container logs, these are the errors.
2019-12-23 12:06:49,341 DEBUG awx.main.dispatch publish awx.main.tasks.cluster_node_heartbeat(5bc7544e-d5c7-44ac-9df0-335bc508b3b8, queue=awx-1) [2019-12-23 12:06:49,341: DEBUG/Process-1] publish awx.main.tasks.cluster_node_heartbeat(5bc7544e-d5c7-44ac-9df0-335bc508b3b8, queue=awx-1) 2019-12-23 12:06:49,501 DEBUG awx.main.models.mixins No credential configured to post back webhook status, skipping. 2019-12-23 12:06:49,501 ERROR awx.main.dispatch job 946 (failed) is no longer running; reaping 2019-12-23 12:06:49,503 DEBUG awx.main.dispatch delivered 5bc7544e-d5c7-44ac-9df0-335bc508b3b8 to worker[182] qsize 0 2019-12-23 12:06:49,504 DEBUG awx.main.dispatch task 5bc7544e-d5c7-44ac-9df0-335bc508b3b8 starting awx.main.tasks.cluster_node_heartbeat([]) 2019-12-23 12:06:49,505 DEBUG awx.main.tasks Cluster node heartbeat task. 2019-12-23 12:06:49,523 DEBUG awx.main.dispatch publish awx.main.tasks.awx_k8s_reaper(e3da2008-852d-4c61-ae0e-5cf7f9212f10, queue=awx-1) [2019-12-23 12:06:49,523: DEBUG/Process-1] publish awx.main.tasks.awx_k8s_reaper(e3da2008-852d-4c61-ae0e-5cf7f9212f10, queue=awx-1) 2019-12-23 12:06:49,537 DEBUG awx.main.dispatch task 5bc7544e-d5c7-44ac-9df0-335bc508b3b8 is finished 2019-12-23 12:06:49,538 DEBUG awx.main.dispatch delivered e3da2008-852d-4c61-ae0e-5cf7f9212f10 to worker[183] qsize 0 2019-12-23 12:06:49,540 DEBUG awx.main.dispatch task e3da2008-852d-4c61-ae0e-5cf7f9212f10 starting awx.main.tasks.awx_k8s_reaper([]) 2019-12-23 12:06:49,562 DEBUG awx.main.dispatch task e3da2008-852d-4c61-ae0e-5cf7f9212f10 is finished
Please let me know in case you need further details to help identifying the issue/fixing it.
Regards, Vibin