ansible / awx-operator

An Ansible AWX operator for Kubernetes built with Operator SDK and Ansible. 🤖
https://www.github.com/ansible/awx
Apache License 2.0
1.26k stars 631 forks source link

AWX fails to deploy on OpenShift when using project persistence #647

Open gjsmo opened 3 years ago

gjsmo commented 3 years ago
ISSUE TYPE
SUMMARY

When setting spec.projects_persistence: true the operator fails to deploy AWX on OpenShift.

ENVIRONMENT
STEPS TO REPRODUCE
  1. Install AWX Operator on OpenShift
  2. Apply configuration using spec.projects_persistence: true
EXPECTED RESULTS

AWX deploys and is available.

ACTUAL RESULTS

AWX fails to deploy. The ReplicaSet gives the following error message: "Error creating: pods "awx-65c446586f-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1000}: 1000 is not an allowed group, provider "nonroot": Forbidden: not usable by user or servic eaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]"

ADDITIONAL INFORMATION

awx-config.yaml (with minor redactions):

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  service_type: nodeport
  ingress_type: route
  hostname: awx.apps.dev.example.com
  postgres_image: bitnami/postgresql
  redis_image: bitnami/redis
  postgres_storage_class: ovirt-csi-sc
  ldap_cacert_secret: idm-ca
  bundle_cacert_secret: idm-ca
  projects_persistence: true
  projects_storage_class: ovirt-csi-sc-hdd

---
apiVersion: v1
kind: Secret
metadata:
  name: idm-ca
  namespace: awx
data:
  bundle-ca.crt: |
    [redacted]
  ldap-ca.crt: |
    [redacted]
AWX-OPERATOR LOGS

None at the moment, can provide later if required.

chrismeyersfsu commented 2 years ago

We think this is the problem line https://github.com/ansible/awx-operator/blob/d1d6785b7dc704fe6e0093eef680d4a849b20f90/roles/installer/templates/deployment.yaml.j2#L316 but don't know what we need to do to resolve it.

mrcetinel commented 2 years ago

You need to define scc for the application.

gjsmo commented 2 years ago

@mrcetinel As I understand it, OpenShift will create a group and user when necessary. Is there a reason the fsGroup needs to be 1000? There's more than one case I've encountered where simply removing that line fixes the issue. I did try giving both the default and awx users privileged SCC roles but that didn't help either.

As far as what can be done to fix it, would adding a 'is-openshift" flag to the config be appropriate to disable this? Or possibly a securityGroup override?

mrcetinel commented 2 years ago

@gjsmo The one of the easiest way to define scc anyiud for default service account. The privileged scc will not help you. Also it is not possible to define custom service account this is why we need to define anyuid scc to default service account.

I think that will solve your problem.

But if you will set persistence to true and use NFS you will hit this issue. #532 I opened this issue but there is not any solution provided until now. So we are not using persistent volume anymore.

On the otherside it is not a requirement to use persistent volume for AWX. The only advantage that will speed up the process of pulling the repositories from Git server.
$ oc adm policy add-scc-to-user anyuid -z default

gjsmo commented 2 years ago

@mrcetinel That doesn't seem to work unfortunately, maybe I'm missing something. I add the SCC, update the AWX resource, and the operator creates the PVC but still cannot create the pod. The error is the same.

According to the docs on SCCs privileged is the most relaxed SCC and should allow running with any UID/GID. I'm still curious about if the fsGroup is required at all - OpenShift should provision a UID/GID for the application with no specification needed.

mrcetinel commented 2 years ago

@gjsmo Could you please give it a try for below procedure ? It is working like a charm on OKD.

$ oc new-project awx $ oc apply -f https://raw.githubusercontent.com/ansible/awx-operator/0.12.0/deploy/awx-operator.yaml $ oc create sa awx $ oc adm policy add-scc-to-user privileged -z awx $ oc adm policy add-cluster-role-to-user cluster-admin -z awx-operator

chadmf commented 2 years ago

FYI, Most people are unable to add scc contexts to their clusters due to security restrictions. Can we instead change to use a uid > 1000?

shanemcd commented 2 years ago

@rooftopcellist It doesn't look like the commit that referenced this issue ever got merged into the main fork. When you have a chance, it'd be good to finish this up.

sumanthvm commented 2 years ago

@rooftopcellist was fix merged ? I am facing same issue.

zentavr commented 2 years ago

@gjsmo Could you please give it a try for below procedure ? It is working like a charm on OKD.

$ oc new-project awx $ oc apply -f https://raw.githubusercontent.com/ansible/awx-operator/0.12.0/deploy/awx-operator.yaml $ oc create sa awx $ oc adm policy add-scc-to-user privileged -z awx $ oc adm policy add-cluster-role-to-user cluster-admin -z awx-operator

I am installing 0.15.0 version of AWX operator and had the same issue. As a workaround did the next:

$ oc get sa
NAME                              SECRETS   AGE
awx                               2         13m
awx-operator-controller-manager   2         4d14h
builder                           2         6d4h
default                           2         6d4h
deployer                          2         6d4h

# We are going to adjust `awx` service account name
$ oc adm policy add-scc-to-user privileged --serviceaccount=awx
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "awx"
mrcetinel commented 2 years ago

FYI, Most people are unable to add scc contexts to their clusters due to security restrictions. Can we instead change to use a uid > 1000?

I am not sure if it is possible. To be honest I tried it but it did not work about 3-4 months ago, need to test it again.

@zentavr Thank you for your feedback