ansible / awx-ee

An Ansible execution environment for AWX project
https://quay.io/ansible/awx-ee
Other
137 stars 159 forks source link

image awx-ee:latest broken for use with awx-operator #258

Open sgreinerCNS opened 3 weeks ago

sgreinerCNS commented 3 weeks ago

The awx-web and awx-task kubernetes pods stop working with Init:CrashLoopBackOff

the reason was the init container's image quay.io/ansible/awx-ee:latest

ln: failed to create symbolic link '/etc/pki/ca-trust/extracted/pem/directory-hash/ca-certificates.crt': Permission denied

I manually edited the deployments to use quay.io/ansible/awx-ee:24.6.1 instead and the pods come up again. Unfortunately the awx-operator wants to change it back to the broken latest tag.

Jed-Giblin commented 2 days ago

Encountering the same issue. This can be reproduced by draining the node they are running on, on first boot on the new node this will happen. Recreating the pod on the new node will restore functionality.

k8s info:

Client Version: v1.28.10
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.10

AWX Resource Details

Labels:       app.kubernetes.io/component=awx
              app.kubernetes.io/managed-by=awx-operator
              app.kubernetes.io/operator-version=2.12.2
              app.kubernetes.io/part-of=awx-prod
Annotations:  <none>
API Version:  awx.ansible.com/v1beta1
Kind:         AWX
Metadata:
  Creation Timestamp:  2024-04-08T18:06:54Z
  Generation:          2
  Resource Version:    79405586

Some extra configuration that might be relevant:

  web_extra_env:    - name: LDAPTLS_CACERT
  value: /etc/pki/ca-trust/source/anchors/bundle-ca.crt

Above file inside the container is the CA for a local LDAP domain

Status:
  Admin Password Secret:       <redact>
  Admin User:                  <redact>
  Broadcast Websocket Secret:  <redact>
  Conditions:
    Last Transition Time:         2024-10-14T12:51:29Z
    Reason:
    Status:                       False
    Type:                         Failure
    Last Transition Time:         2024-10-14T12:50:18Z
    Reason:                       Successful
    Status:                       True
    Type:                         Running
    Last Transition Time:         2024-10-14T13:16:05Z
    Reason:                       Successful
    Status:                       True
    Type:                         Successful
  Image:                          quay.io/ansible/awx:23.9.0
  Postgres Configuration Secret:  <redact>
  Secret Key Secret:              <redact>
  Version:                        23.9.0
sgreinerCNS commented 1 day ago

Our situation was similar, also involving a LDAPS CA and a CA Bundle (required because TLS Deep Inspection by Security Appliances).

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: cns-awx
  namespace: awx
spec:
  image_pull_policy: Always
  control_plane_ee_image: quay.io/ansible/awx-ee:23.3.0
  init_container_image: quay.io/ansible/awx-ee
  init_container_image_version: 24.6.1
  ingress_type: Ingress
  hostname: <redact>
  ingress_annotations: ""
  ingress_tls_secret: <redact>
  admin_user: <redact>
  admin_email: <redact>
  admin_password_secret: <redact>
  web_resource_requirements:
    requests:
      cpu: 200m
      memory: 500Mi
  task_resource_requirements:
    requests:
      cpu: 200m
      memory: 500Mi
  ldap_cacert_secret: <redact>
  bundle_cacert_secret: <redact>
  secret_key_secret: <redact>
  projects_persistence: true
  projects_existing_claim: cns-awx-storage-projects-claim
  postgres_storage_requirements:
    requests:
      storage: 4Gi
  postgres_storage_class: postgres

The _ldap_cacertsecret gets the "file" ldap-ca.crt and _bundle_cacertsecret get the "file" bundle-ca.crt via a secret

By setting _init_containerimage and pinning _init_container_imageversion to 24.6.1 I was able to avoid the buggy awx-ee:latest which cannot set ca-certificates.crt for some reason