CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.91k stars 590 forks source link

CrunchyData "repo" (pgbackrest) instance not using serviceAccount. (Permissions in the namespace's "default" serviceAccount affect deployment) #3472

Open nnachefski opened 1 year ago

nnachefski commented 1 year ago

I created a DB using CrunchyData (named "tracking"), but i also have "anyuid" policy set for the project's 'default' ServiceAccount. The initContainer ("pgbackrest-log-dir") in the "tracking-repo-host" StatefulSet failed to deploy citing:

mkdir: cannot create directory ‘/pgbackrest/repo1/log’: Permission denied

# oc get sts tracking-repo-host -n dev -o yaml |grep serviceAccount
<nothing>
# oc get sa |grep tracking
tracking-instance          1         142m
tracking-pgbackrest        1         142m     <----- shouldnt the sts being using this SA and not 'default' ?

If i remove the 'anyuid' ClusterRoleBinding from the 'default' serviceAccount and try again it works fine.

-Nick

cbandy commented 1 year ago

Which version of PGO and OpenShift are you using?

dan1el-k commented 1 year ago

We are experiences the same on multiple our of OKD clusters (OKD 4.9, 4.10, 4.11) and PGO 5.x.x. But we can limit the issue scope, this only happens when you run multiple services which are using the "default" namespace. For any reason if installing PGO in a naked namespace, then the default serviceAccount works.

The (manual) solution to run it with other services in the same namespace, is to set the serviceAccount to the generated on of PGO in the "repo-host" statefullset.

joyartoun commented 1 year ago

Hi!

I'm experiencing the same issue. The postgresCluster CR has no property for setting serviceAccount for pgbackrest. So I have to assing SCCs to the default serviceaccount. Running OKD 4.11 operator 5.3.0

nnachefski commented 1 year ago

This issue is still happening on OKD 4.12 with CrunchyData 5.3.0

The problem manifests itself in the pgbackrest-log-dir initContainer.

Here is the work-around for now: (change the sts and serviceAccount name to whatever your's is called)

oc patch sts airflow-repo-host --type=merge -p '{"spec":{"template":{"spec":{"initContainers":[{"name":"pgbackrest-log-dir"}],"serviceAccountName":"airflow-pgbackrest"}}}}'

douggutaby commented 1 year ago

Thank you @nnachefski, it helped me a lot. The pod can work, backups are fine, but it cannot write logs, because the openshift uid doesn't have write access to the log dir: sh-4.4$ ls -la /pgbackrest/repo1/log/ total 0 drwxr-xr-x. 2 26 26 0 Jun 5 10:06 .

I think uid of postgres user is 26 in the image, but we use openshift uid here.

I just wanted to highlight this, if someone like me find this issue and WA. I hope Crunchy will fix this soon.

loydbanks commented 5 months ago

I am having a question related to the use of this service account. The repo-host pod is using the default service account in my case and I am getting the error

option 'repo1-s3-key-type' is 'web-id' but 'AWS_ROLE_ARN' and 'AWS_WEB_IDENTITY_TOKEN_FILE' are not set

I believe it has to do with the pod using the default service account whilst the other pod is using the helix-instance service account (my Postgres cluster is called helix) which does have the AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE. Is this pod meant to be using the default service account or the helix-instance service account like the other pod. I am using AWS EKS trying to backup to AWS s3. Please help me

jeffgus commented 2 weeks ago

I'm experiencing the same issue. I tested an awscli pod with the service account I created for the purpose of using with the operator. I've used plenty of IRSA pods in other places and I know it works.

I can't get it to work with pgbackrest. I can override the repo-host-0 pod with the serviceAccountName, but I still get the error:

option 'repo2-s3-key-type' is 'web-id' but 'AWS_ROLE_ARN' and 'AWS_WEB_IDENTITY_TOKEN_FILE' are not set

If I apply the metadata to all the service accounts so that the database pods are now using a service account with the AWS_ROLE_ARN set, I get this error:

ERROR: [029]: unable to find child 'AssumeRoleWithWebIdentityResult':0

I tried to fiddle with the Trust Releationship configuration (see: #3135), but that doesn't seem to fix it.