ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.26k stars 628 forks source link

Error creating pod: Internal error occurred #1735

Open sivap083 opened 8 months ago

sivap083 commented 8 months ago

Please confirm the following

Bug Summary

We are able to sync the project (integrate with github private repo using PAT), however, while syncing the inventory from github repo (inventory.ini) the sync job is failing with following error.

Receptor detail: Error creating pod: Internal error occurred: failed calling webhook "pods.env-injector.admission.spv.no": failed to call webhook: an error on the server ("{\"response\":{\"uid\":\"6ff9d882-4b77-48f3-b938-8d2a949efdbb\",\"allowed\":false,\"status\":{\"metadata\":{},\"status\":\"Failure\",\"message\":\"Secret \\"akv2k8s-automation-job-140-\\" is invalid: metadata.name: Invalid value: \\"akv2k8s-automation-job-140-\\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is 'a-z0-9?(\\.a-z0-9?)*')\"}}}") has prevented the request from succeeding

awx_inventory_sycn_error

AWX Operator version

2.9.0

AWX version

23.5.1

Kubernetes platform

kubernetes

Kubernetes/Platform version

1.27.7

Modifications

no

Steps to reproduce

We are able to sync the project/inventory on our personal lab environment without any issues, however on our client k8s environment we are getting this during validation of inventory sync job/pod.

Expected results

It should be able to sync the inventory successfully.

Actual results

Inventory sync job is failing.

Additional information

No response

Operator Logs

No response

thedoubl3j commented 8 months ago

@TheRealHaoLiu any insight on this? We are not sure if this is default naming for kube pod? It looks like a "-" is getting injected at the end and the regex is flagging that.

sivap083 commented 8 months ago

@thedoubl3j @TheRealHaoLiu It appears to be related to the default naming of the job. The regex doesn't permit a "-" at the end of the job name. Is there a way to override the job name created by the awx-operator?

kurokobo commented 8 months ago

@sivap083 There is no way to override to job name, but in the first place I think the issue from your akv2k8s.

The name that end with - is metadata.generateName, and actual pod name is generated by appending random five chars to metadata.generateName as follows. Ending with - for generateName is common design in Kubernetes.

$ kubectl -n awx get pod automation-job-1-s6wr8 -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-03-03T05:47:14Z"
  generateName: automation-job-1-     ✅
  labels:
    ansible-awx: 4d04ae52-8893-4540-88b5-ca11953e91f8
    ansible-awx-job-id: "1"
  name: automation-job-1-s6wr8     ✅
  namespace: awx
  resourceVersion: "216684"
  uid: b257283f-476e-4a1f-88e4-d0fb2245362c
spec:
  automountServiceAccountToken: false

Your issue is use of generateName by akv2k8s to generate secret name, but akv2k8s does not adopt generateName unless akv2k8s is unable to gather name. Here is the code block.

https://github.com/SparebankenVest/azure-key-vault-to-kubernetes/blob/647b7156685907bb1c5257c9e0596f4eab355165/cmd/azure-keyvault-secrets-webhook/auth/auth.go#L259-L284

Are there any logs on your aks2k8s controller and env-injector? Also could you please ensure your akv2k8s works as expected in your namespace where AWX is running on, by creating new deployment manually.

abhilashjoseph commented 8 months ago

@kurokobo since the name came out empty as per the above code, it looks for owner references first before resorting to use of generateName, can the operator add owner references to the task job manifest?

kurokobo commented 8 months ago

@abhilashjoseph Creating job pods is the responsibility of AWX, not AWX Operator. I didn't make any test for this but you can try defining and using Container Group with customized pod specification that includes ownerReferences.

However, I am not sure why akv2k8s cannot get the name of the pod 🤔

abhilashjoseph commented 8 months ago

@kurokobo thank you for the reply, from what I understand, the akv2k8s has a mutating webhook configuration that mutates the pod before it is scheduled by the k8s scheduler, probably the name isn't available at that point. Sorry for my ignorance on how awx/ awx operator works, even if it is created by awx, it is general practice in kubernetes to set owner references to identify who / what owns the task pod.

abhilashjoseph commented 8 months ago

@kurokobo thank you for the link to the customization of pod spec, that should help here I believe

kurokobo commented 8 months ago

@abhilashjoseph Hi, thanks for your reply. I don't have any expertise about akv2k8s and don't know detailed architecture, but your reply makes sense that webhook is invoked before actual name to be generated. Again I didn't make any test yet but I think you can specify Container Group as Instance Groups on the Organization for Inventory sync job, not only for Job Template.

abhilashjoseph commented 8 months ago

@kurokobo setting ownerRefernces in the customized pod spec seems to work and resolve this issue, thank you for your assistance!

kurokobo commented 8 months ago

@abhilashjoseph Thanks for updating, if you still want AWX to have ownerRefernces for job pods by default, you can open RFE on AWX repo: https://github.com/ansible/awx/issues/new?assignees=&labels=&projects=&template=feature_request.yml