Open diademiemi opened 8 months ago
@diademiemi Hi,
Our goal here is to run Receptor in a Kubernetes cluster so we can host execution and/or hop nodes in Kubernetes.
The current AWX implementation assumes that the execution nodes are running as the hosts where Ansible Runner is running locally and Podman is installed. So in the first place it's hard to run execution nodes in Kubernetes cluster since if we select execution nodes for some job templates AWX sends request to ansible runner on the execution nodes to run execition environment by creating container on the Podman, instead of Kubernetes.
Alternatively, I recommend you this to achieve similar goals; we can define Container Group with credentials for the remote Kubernetes cluster. This allows us to run EE on remote Kubernetes cluster: https://ansible.readthedocs.io/projects/awx/en/latest/administration/containers_instance_groups.html#create-a-container-group
Running hop node on Kubernetes cluster is not so hard, since hop node never be used to invoke any commands. No podman nor ansible runner are required. In addition, the feature "in-cluster hop node" called AWXMeshIngress will be implemented in the next release: https://github.com/ansible/awx/pull/14640
Here are my answer for your questions for your technical interest:
worktype
is just a name. The documentation uses kubeit
not because it is required for Kubernetes work, but simply as one example, given a simple name.ansible-runner
worktype to run health check. This will invoke ansible-runner worker --worker-info
on the execution nodes.ansible-playbook
in isolated environment (means running it in Podman container), so in this case EE container is created by Ansible Runner.kubernetes-runtime-auth
worktype or kubernetes-incluster-auth
worktype).If you have further insterest, my blog article may helps you (sorry it is in Japanese, so please use some translator): https://blog.kurokobo.com/archives/4847 Or ask further questions on the forum: https://forum.ansible.com/
It would be appropriate to improve the error message, perhaps in an enhancement request on the Receptor side.
as @kurokobo mentioned, container groups are designed to achieve running jobs on remote k8s clusters
AWX expects execution node to have a work-command
called ansible-runner
for health checks
but when running jobs, AWX also uses this same work command. So even if you have a proper kubeit
work-kubernetes setup in the config, AWX is not going to utilize it sadly. That would require a bit of changes in AWX to get that working.
Is there a use case for this that container groups doesn't cover?
Thank you for the detailed response! I understand a lot better now what this is doing under the hood
I'll be checking out the AWXMeshIngress and Container Groups feature today and tomorrow and I'll get back to you for if this covers our usecase.
Please confirm the following
security@ansible.com
instead.)Bug Summary
When using the
work-kubernetes
type as described in the documentation, we get the following error when checking the health of the node from AWX.Does AWX not support the
work-kubernetes
type yet and the health check is not reporting a readable error for this? The error is quite vague and I'm not sure what the issue is.Our goal here is to run Receptor in a Kubernetes cluster so we can host execution and/or hop nodes in Kubernetes. I'm not certain whether this is an issue in AWX or in Receptor.
AWX version
23.7.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
Ubuntu 22.04
Web browser
No response
Steps to reproduce
The following receptor config is used:
receptor.conf
```yaml --- - node: id: 192.168.21.54 - work-verification: publickey: /etc/receptor/work_public_key.pem - log-level: debug - control-service: service: control filename: /tmp/receptor.sock permissions: 0660 tls: tls_server - tls-server: name: tls_server cert: /etc/receptor/tls/receptor.crt key: /etc/receptor/tls/receptor.key clientcas: /etc/receptor/tls/ca/mesh-CA.crt requireclientcert: true mintls13: False - tls-client: name: tls_client cert: /etc/receptor/tls/receptor.crt key: /etc/receptor/tls/receptor.key rootcas: /etc/receptor/tls/ca/mesh-CA.crt insecureskipverify: false mintls13: False - tcp-listener: port: 27199 tls: tls_server - work-kubernetes: worktype: kubeit authmethod: kubeconfig allowruntimeauth: true allowruntimepod: true allowruntimeparams: true verifysignature: true ```After starting Receptor and checking the health of the instance, I get the error.
Expected results
AWX should succeed the health check and use Receptor to run workloads on the Kubernetes cluster with the
kubeit
worktype.If this is not a supported usecase yet, I would expect a clearer error message. This error message seems quite arbitrary to me and confused us for days.
Actual results
We get an error
This does not seem relevant to what we are trying to achieve. I looked through the code to see what causes this and it seems to be related to the health check not using the correct work type (more information later).
Additional information
It seems this error occurs due to the workType being given as
ansible-runner
instead ofkubeit
. I'm not too familiar with the code at work here, but I added some debug statements in Receptor.And in ShouldVerifySignature
Am I correct here in that it seems like it thinks the only valid workTypes are
kubeit
andremote
here but AWX is sendingansible-runner
for the health check?