awx-web missing #1832

Open Reign1 opened 1 month ago

Reign1 commented 1 month ago

Please confirm the following

Bug Summary

Following documentation I've installed awx-operator with helm install. Ended up with these resources:

NAME READY STATUS RESTARTS AGE pod/awx-operator-controller-manager-69d8f784d8-5llkl 2/2 Running 0 12h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/awx-operator-controller-manager-metrics-service ClusterIP 8443/TCP 12h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/awx-operator-controller-manager 1/1 1 1 12h NAME DESIRED CURRENT READY AGE replicaset.apps/awx-operator-controller-manager-69d8f784d8 1 1 1 12h

On top of that created awx-demo.yaml:

apiVersion: kind: AWX metadata: name: awx-demo spec: service_type: nodeport

Applied it with "kubectl -n awx apply -f awx-demo.yaml", got output: " created".

Still I see no awx-web. Checked the logs "kubectl logs -f awx-operator-controller-manager-69d8f784d8-5llkl -n awx" and see this:

AWX Operator version


AWX version


Kubernetes platform


Kubernetes/Platform version




Steps to reproduce

On a fresh k8s cluster (created with kubeadm) I'm trying to setup AWX. As per documentation I did helm install. That is it.

Expected results

Default AWX setup up and running with fronted exposed to be able to login and try it out.

Actual results

awx-operator deplyed but no awx-web pods running.

Additional information

No response

Operator Logs

kubectl logs -f awx-operator-controller-manager-69d8f784d8-5llkl -n awx:

{"level":"info","ts":"2024-04-17T19:23:30Z","logger":"cmd","msg":"Version","Go Version":"go1.20.12","GOOS":"linux","GOARCH":"amd64","ansible-operator":"v1.34.0","commit":"d26c43bf94960d292152862a6685696be33190fb"} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"cmd","msg":"Watching namespaces","namespaces":["awx"]} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"watches","msg":"Environment variable not set; using default value","envVar":"ANSIBLE_VERBOSITY_AWX_AWX_ANSIBLE_COM","default":2} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"watches","msg":"Environment variable not set; using default value","envVar":"ANSIBLE_VERBOSITY_AWXBACKUP_AWX_ANSIBLE_COM","default":2} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"watches","msg":"Environment variable not set; using default value","envVar":"ANSIBLE_VERBOSITY_AWXRESTORE_AWX_ANSIBLE_COM","default":2} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"watches","msg":"Environment variable not set; using default value","envVar":"ANSIBLE_VERBOSITY_AWXMESHINGRESS_AWX_ANSIBLE_COM","default":2} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"ansible-controller","msg":"Watching resource","Options.Group":"","Options.Version":"v1beta1","Options.Kind":"AWX"} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"ansible-controller","msg":"Watching resource","Options.Group":"","Options.Version":"v1beta1","Options.Kind":"AWXBackup"} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"ansible-controller","msg":"Watching resource","Options.Group":"","Options.Version":"v1beta1","Options.Kind":"AWXRestore"} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"ansible-controller","msg":"Watching resource","Options.Group":"","Options.Version":"v1alpha1","Options.Kind":"AWXMeshIngress"} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"proxy","msg":"Starting to serve","Address":""} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"apiserver","msg":"Starting to serve metrics listener","Address":"localhost:5050"} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"} {"level":"info","ts":"2024-04-17T19:23:30Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":"","secure":false} {"level":"info","ts":"2024-04-17T19:23:30Z","msg":"starting server","kind":"health probe","addr":"[::]:6789"} I0417 19:23:30.391565 2 leaderelection.go:250] attempting to acquire leader lease awx/awx-operator... E0417 19:24:00.393847 2 leaderelection.go:332] error retrieving resource lock awx/awx-operator: Get "": dial tcp i/o timeout ...

YaronL16 commented 1 month ago

Have you used a customized values.yaml file to enable the AWX resource?

Are the postgress and awx-task pods creating?

Reign1 commented 1 month ago

@YaronL16 , I only did what's provided in the Helm install instructions here: , and also did this "kubectl -n awx apply -f awx-demo.yaml". Content of awx-demo.yaml provided above. I would expect Helm install document to be complete (eg. you get front end exposed). If it's not - what's missing? Thanks!

YaronL16 commented 1 month ago

Well technically you did install the Operator, you just havent told it to set up the AWX resource.

But I agree the documentation is a bit lackluster. Anyway, as it says on the documentation, you should customize the installation with your own values file to overwrite the default ones. Most importantly set AWX.enabled to 'true'.

More info here:

Reign1 commented 1 month ago

@YaronL16 thanks for the input, really helpful and everything makes more sense now. Indeed I did Help install without -f passing my own values. What is still not clear though is content of myvalues.yaml. What is the very minimum to have frontend exposed and be able to login as admin?

  enabled: true

Is this it?

YaronL16 commented 1 month ago

I would have something like this at the minimum:

  enabled: true
  name: awx-demo
    service_type: ClusterIP

@kurokobo created a nice base values file as seen here:

You could also define custom images and other configs

yyosha commented 1 week ago

I have similar problem on existing EKS cluster. Kubernetes and AWS-nodes are up-to-date.

Using following kustomization:

kind: Kustomization

## Specify a custom namespace in which to install AWX
namespace: awx

  disableNameSuffixHash: true

### Postgesql secret was moved to awx-secrets.yaml which is included in resources

  - name: awx-admin-password
    type: Opaque
      - password=BlaBlaBla

  - name: my-ca-bundle
    type: Opaque
      - bundle-ca.crt

  ## Find the latest tag here:
  - awx-secrets.yaml
  - awx-custom-ee-docker-reg-secret.yaml
  - awx-coredns-cm.yaml
  - awx-gp3-sc-retain.yaml
  - awx-efs-sc.yaml
#  - awx-efs-pv.yaml
  - awx-efs-pv-pg15.yaml
  - awx-efs-pvc.yaml
  - awx-with-postgres.yaml

## Set the image tags to match the git version from above
  - name:
    newTag: 2.16.1

Customizing resources with this manifest:

kind: AWX

  name: awx-dev

  ## These parameters are designed for use with:
  ## - AWX Operator: 2.10
  ## - AWX: 23.6.0
  ## Upgraded to:
  ## - AWX Operator: 2.16.1
  ## - AWX: 24.3.1

  ## This line controls the log output of the deployment
  no_log: false

  ## Disable ip_v6
  ipv6_disabled: true

  ##              awx             ##

  admin_user: admin
  admin_password_secret: awx-admin-password
  bundle_cacert_secret: my-ca-bundle

  ## hostname value is used in the ALB Listener rules
  ## if host is equal to <hostname value> then traffic will be forwarded to Target Group

  ## Customized control-plane-ee
  control_plane_ee_image: myrepo/my-awx-ee:2.16.1_1

  ## Customized awx-ee
    - name: custom-awx-ee
      image: myrepo/my-awx-ee:2.16.1_1

  ## Custom ee docker pull secret
    - awx-custom-ee-docker-reg-secret

  ## console listens on nodes port so ALB ingress can be used
  service_type: NodePort
  nodeport_port: 30080

  ## make projects data persistent on EFS
  ## need storage class, filesystem & mount points on all subnets to be pre-configured
  projects_persistence: true
#  ## use either -
#  ## 'projects_storage_class' for dynamic allocation of persistent volume
#  ## 'projects_existing_claim' for pre-configured persistent volume claim
#  projects_storage_class: efs-projects-storageclass
#  projects_existing_claim: awx-projects-claim

  ##            ingress           ##

  ingress_type: ingress
  ingress_path: '/'
  ingress_path_type: Prefix
  ingress_annotations: | alb '[{"HTTPS":443}, {"HTTP":80}]' '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}' "arn:aws:acm:xxxxxxxxxxxxxxxxxx" 'ELBSecurityPolicy-TLS13-1-2-Res-2021-06' 'internal' 'instance' 'ipv4' 'sg-xxxxxxxxxxxxxxxxxx' 'idle_timeout.timeout_seconds=360' HTTP traffic-port '15' '5' '200' '2' '2' 'true'

  ##          postgresql          ##

  postgres_configuration_secret: awx-postgres-configuration

#  ## Select postresql image and image version
#  #
#  #  postgres_image:
#  #  postgres_image: postgres
#  #  postgres_image_version: 'latest'
#  image_pull_policy: Always

  ## make postgress db persistent on EFS
  ## need storage class, filesystem & mount points on all subnets to be pre-configured
  postgres_storage_class: efs-postgres-storageclass
      storage: 15Gi
      storage: 35Gi

## EOF

This works perfectly with version 2.10.0, but when trying to deploy from scratch with version 2.16.1, in the logs I see that awx-dev-web is missing and when describing the pod, I get:

  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  35m                  default-scheduler  Successfully assigned awx/awx-dev-web-94cdf9d45-vkr54 to ip-10-167-0-76.ec2.internal
  Normal   Pulled     35m                  kubelet            Container image "" already present on machine
  Normal   Created    35m                  kubelet            Created container init
  Normal   Started    35m                  kubelet            Started container init
  Normal   Pulled     34m (x5 over 35m)    kubelet            Container image "" already present on machine
  Normal   Created    34m (x5 over 35m)    kubelet            Created container init-projects
  Normal   Started    34m (x5 over 35m)    kubelet            Started container init-projects
  Warning  BackOff    43s (x160 over 35m)  kubelet            Back-off restarting failed container init-projects in pod awx-dev-web-94cdf9d45-vkr54_awx(6c47c5a1-d1b6-4f9b-8b85-8a803da2df2c)
YaronL16 commented 1 week ago

@yyosha should probably look into the logs of the crashing init container

yyosha commented 1 week ago


Pod is in CrashLoopBackOff status

kc logs -f pod/awx-dev-web-567665cb76-hmc5q -c awx-dev-web -n awx
Error from server (BadRequest): container "awx-dev-web" in pod "awx-dev-web-567665cb76-hmc5q" is waiting to start: PodInitializing
YaronL16 commented 1 week ago


kc logs -f pod/awx-dev-web-567665cb76-hmc5q -c awx-dev-web -n awx
Error from server (BadRequest): container "awx-dev-web" in pod "awx-dev-web-567665cb76-hmc5q" is waiting to start: PodInitializing

Get logs from the container after it has failed, or from the previous container (--previous)

yyosha commented 1 week ago


kc logs -f pod/awx-dev-web-567665cb76-hmc5q -c awx-dev-web -n awx --previous
Error from server (BadRequest): previous terminated container "awx-dev-web" in pod "awx-dev-web-567665cb76-hmc5q" not found

From operator logs I get this:

TASK [installer : Get the new resource pod information after updating resource.] ***
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:258\nskipping: [localhost] => {\"changed\": false, \"false_condition\": \"this_deployment_result.changed\", \"skip_reason\": \"Conditional result was False\"}\n
TASK [installer : Update new resource pod as a variable.] **********************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:275\nskipping: [localhost] => {\"changed\": false, \"false_condition\": \"this_deployment_result.changed\", \"skip_reason\": \"Conditional result was False\"}\n
TASK [installer : Update new resource pod name as a variable.] *****************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:283\nskipping: [localhost] => {\"changed\": false, \"false_condition\": \"this_deployment_result.changed\", \"skip_reason\": \"Conditional result was False\"}\n
TASK [installer : Verify the resource pod name is populated.] ******************
task path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:289\nfatal: [localhost]: FAILED! => {
    \"assertion\": \"awx_web_pod_name != ''\",
    \"changed\": false,
    \"evaluated_to\": false,
    \"msg\": \"Could not find the tower pod's name.\"
PLAY RECAP *********************************************************************
localhost                  : ok=69   changed=0    unreachable=0    failed=1    skipped=68   rescued=0    ignored=0   \n","job":"3522416367647485710","name":"awx-dev","namespace":"awx","error":"exit status 2","stacktrace":"*runner).Run.func1\n\tansible-operator-plugins/internal/ansible/runner/runner.go:269"}

Again, this work perfectly with version 2.10.0

fosterseth commented 1 week ago

kc logs -f pod/awx-dev-web-567665cb76-hmc5q -c init-projects -n awx

does that return anything helpful?

yyosha commented 1 week ago

@fosterseth I re-deployed ver. 2.16.1 (this is a VERY test env.), hance the different pod name...

kc logs -f pod/awx-dev-web-6b4b544584-mqppn -c init-projects -n awx

Yielded nothing.

But since now I have this

  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  4m6s                  default-scheduler  Successfully assigned awx/awx-dev-web-6b4b544584-mqppn to ip-10-167-0-76.ec2.internal
  Normal   Pulled     4m6s                  kubelet            Container image "" already present on machine
  Normal   Created    4m6s                  kubelet            Created container init
  Normal   Started    4m5s                  kubelet            Started container init
  Normal   Pulled     4m5s                  kubelet            Container image "" already present on machine
  Normal   Created    4m5s                  kubelet            Created container init-projects
  Normal   Started    4m5s                  kubelet            Started container init-projects
  Normal   Created    4m4s                  kubelet            Created container redis
  Normal   Pulled     4m4s                  kubelet            Container image "" already present on machine
  Normal   Started    4m4s                  kubelet            Started container redis
  Normal   Pulled     4m4s                  kubelet            Container image "" already present on machine
  Normal   Created    4m4s                  kubelet            Created container awx-dev-rsyslog
  Normal   Started    4m3s                  kubelet            Started container awx-dev-rsyslog
  Normal   Created    2m51s (x3 over 4m4s)  kubelet            Created container awx-dev-web
  Normal   Started    2m51s (x3 over 4m4s)  kubelet            Started container awx-dev-web
  Warning  BackOff    2m11s (x3 over 3m4s)  kubelet            Back-off restarting failed container awx-dev-web in pod awx-dev-web-6b4b544584-mqppn_awx(e5540567-38f8-4be9-86b3-8602ce7ff7d5)
  Normal   Pulled     2m (x4 over 4m4s)     kubelet            Container image "" already present on machine

I ran this:

kc logs -f pod/awx-dev-web-6b4b544584-mqppn -c awx-dev-web -n awx

and got this very very long log, which I attached here. awx-operator-2.16.1.txt

Reign1 commented 6 days ago

Managed to fix all issues. Currect state is:

kubectl get all -n awx | grep awx pod/awx-migration-24.4.0-rv74w 0/1 Completed 0 3m23s pod/awx-operator-controller-manager-5b9cb84bd5-g54xx 2/2 Running 0 10m pod/awx-postgres-15-0 1/1 Running 0 3m54s pod/awx-task-6f65778bd-wwzld 4/4 Running 0 3m35s pod/awx-web-988fccf6d-w5pz2 3/3 Running 0 3m36s service/awx-operator-controller-manager-metrics-service ClusterIP 8443/TCP 10m service/awx-postgres-15 ClusterIP None 5432/TCP 3m54s service/awx-service ClusterIP 80/TCP 3m38s deployment.apps/awx-operator-controller-manager 1/1 1 1 10m deployment.apps/awx-task 1/1 1 1 3m35s deployment.apps/awx-web 1/1 1 1 3m36s replicaset.apps/awx-operator-controller-manager-5b9cb84bd5 1 1 1 10m replicaset.apps/awx-task-6f65778bd 1 1 1 3m35s replicaset.apps/awx-web-988fccf6d 1 1 1 3m36s statefulset.apps/awx-postgres-15 1/1 3m54s job.batch/awx-migration-24.4.0 Complete 1/1 112s 3m23s

End of the log is a suggested:

PLAY RECAP ***** localhost : ok=90 changed=0 unreachable=0 failed=0 skipped=82 rescued=0 ignored=1

However doesn't show AWX interface anyway. Any ideas?

YaronL16 commented 5 days ago

@Reign1 Not sure how you set up access to the application on the specified URL, but in your output I did not see an ingress resource. So look into your service discovery.