Open jdratlif opened 2 months ago
It's not clear to me how the awx CRD spec values get translated into ansible vars, but https://github.com/ansible/awx-operator/commit/8ead140541622f67bd2d44a3c76bb05739cdebb6 this commit added the web_manage_replicas and task_manage_replicas saying the default is true, but there were no new defaults added to defaults/main.yml to configure them. But web_replicas and task_replicas are set to empty strings there. Do we not need web_manage_replicas and task_manage_replicas set to true in the defaults there as well?
Okay, I think I know what is happening.
Another person is using the awx operator in our cluster. I didn't think this would matter because we're using different namespaces. But the CRDs are not namespaced, so the CRDs are being overwritten at "random" times, and then I lose the values from the newer CRD definitions.
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
# Find the latest tag here: https://github.com/ansible/awx-operator/releases
- github.com/ansible/awx-operator/config/default?ref=2.19.1
# - awx.yaml
# Set the image tags to match the git version from above
images:
- name: quay.io/ansible/awx-operator
newTag: 2.19.1
# Specify a custom namespace in which to install AWX
namespace: sea
Running kustomize on this does not fix the CRDs. But doing kubectl apply --server-side --force-conflicts -k "github.com/ansible/awx-operator/config/crd?ref=2.19.1"
does, at least until whatever helm job installs the older awx-operator on the other namespace kicks in. Downgrades work, upgrades don't? Or maybe it's kustomize vs helm. I'm not sure. I do know that the the CRDs are being overwritten, because after I delete my namespace and try to start over, I can check for postgres_data_volume_init in the CRD and it will be a field, but if I keep checking, it will disappear and postgres_data_path will be there.
Please confirm the following
Bug Summary
After installing a new awx with awx-operator, it scales the web and task deployments down to 0 and awx is completely stopped. It never scales the deployments back up.
AWX Operator version
2.19.1
AWX version
1.27.12
Kubernetes platform
kubernetes
Kubernetes/Platform version
k3s
Modifications
no
Steps to reproduce
Expected results
I expected awx to be running.
Actual results
It starts up, then gets stopped, and doesn't restart without manual intervention.
Additional information
If I use awx-operator 2.18, I don't have this problem. It seems like the problem happened something in 2.19.0 or 2.19.1 release.
Operator Logs
I saw this referenced in https://github.com/ansible/awx-operator/issues/1907, but I'm not upgrading from 2.18, and re-applying the CRDs didn't fix things for me.