CrunchyData / crunchy-containers

Containers for Managing PostgreSQL on Kubernetes by Crunchy Data
https://www.crunchydata.com/
Apache License 2.0
1.01k stars 328 forks source link

crunchy-postgresql-gis not starting on OpenShift 4.7 (Permission denied) #1398

Closed schmandr closed 2 years ago

schmandr commented 2 years ago

I'm trying to run crunchy-postgresql-gis (crunchydata/crunchy-postgres-gis:centos8-13.3-3.1-4.6.3) on OpenShift 4.7. I'm not using the operator, but instead just trying to run the image for a "one-off DB" that is needed temporarily only.

What is the current behavior? The container start fails with a Permission denied error when trying to create a directory in /pgdata. This has probably to do with OpenShift assigning an arbitrary user ID (different than 26) to the postgres user by default. Log output is:

Fri Nov  5 10:23:36 UTC 2021 INFO: Image mode found: postgres
Fri Nov  5 10:23:36 UTC 2021 INFO: Starting in 'postgres' mode
Fri Nov  5 10:23:36 UTC 2021 INFO: Setting PGROOT to /usr/pgsql-13.
Fri Nov  5 10:23:36 UTC 2021 INFO: PG_CTL_START_TIMEOUT set at: 60
Fri Nov  5 10:23:37 UTC 2021 INFO: PG_CTL_STOP_TIMEOUT set at: 60
Fri Nov  5 10:23:37 UTC 2021 INFO: PG_CTL_PROMOTE_TIMEOUT set at: 60
mkdir: cannot create directory ‘/pgdata/processing-db-25-3mnsm-rp8jk-wg3v7’: Permission denied
chmod: cannot access '/pgdata/processing-db-25-3mnsm-rp8jk-wg3v7': No such file or directory
Fri Nov  5 10:23:37 UTC 2021 INFO: Cleaning up the old postmaster.pid file..
Fri Nov  5 10:23:37 UTC 2021 INFO: User ID is set to uid=1000790000(postgres) gid=0(root) groups=0(root),1000790000.
Fri Nov  5 10:23:37 UTC 2021 INFO: Working on primary..
Fri Nov  5 10:23:37 UTC 2021 INFO: Initializing the primary database..
Fri Nov  5 10:23:37 UTC 2021 INFO: PGDATA is empty. ID is uid=1000790000(postgres) gid=0(root) groups=0(root),1000790000. Creating the PGDATA directory..
mkdir: cannot create directory ‘/pgdata/processing-db-25-3mnsm-rp8jk-wg3v7’: Permission denied
Fri Nov  5 10:23:37 UTC 2021 INFO: Starting initdb..
Fri Nov  5 10:23:37 UTC 2021 INFO: XLOGDIR not found. Using default pg_wal directory..
Fri Nov  5 10:23:37 UTC 2021 INFO: Data checksums enabled.  Setting initdb to use data checksums..
Fri Nov  5 10:23:37 UTC 2021 INFO: Running initdb command: initdb -D /pgdata/processing-db-25-3mnsm-rp8jk-wg3v7  --locale=en_US.UTF-8 --data-checksums > /tmp/initdb.stdout 2> /tmp/initdb.stderr
Fri Nov  5 10:23:37 UTC 2021 ERROR: Initializing the database (initdb): Unable to initialize the database: 
initdb: error: could not create directory "/pgdata/processing-db-25-3mnsm-rp8jk-wg3v7": Permission denied

What is the expected behavior? Container should start successfully.

Other information (e.g. detailed explanation, related issues, etc) I guess it would start successfully if I would (and could) set the security constraint context (SCC) to nonroot, and configure the container with runAsUser: 26. However I wonder if there is any other solution than this.

Please tell us about your environment:

jkatz commented 2 years ago

Without the manifest you are using to try to deploy this, it's very difficult to troubleshoout.

I'm not using the operator, but instead just trying to run the image for a "one-off DB" that is needed temporarily only.

That said, it's still likely simplest to just do this with PGO, which has autoconfiguration for OpenShift. v5 is pretty quick to set up. Here is a manifest for getting started with PG13 / PostGIS 3.1:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:centos8-13.4-3.1-1
  postgresVersion: 13
  postGISVersion: "3.1"
  instances:
    - dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi

Otherwise, you have to configure the manifest to be able to handle a restricted SCC (which PGO handles), or as you mention, set up a different SCC to handle it.

schmandr commented 2 years ago

Well, the Pod is actually started by the Jenkins Kubernetes Plugin. This allows configuring a Pod Template according to which the pod is created when Jenkins needs to start an agent. (The pod contains a Jenkins agent container and the database container.) I don't think that I can configure an operator here.

However, based on your suggestion to configure the manifest to handle a restricted SCC, I checked the Kubernetes and OpenShift API docs back and forth. After lots of try and error I found that I just need to add a

securityContext:
  supplementalGroups: [26]

to the Pod Template.

:tada: This works! Thank you a lot!

I put down my new Pod Template configuration here (slightly simplified) for future reference:

apiVersion: "v1"
kind: "Pod"
metadata:
  labels:
    jenkins: "slave"
  name: "processing-db-37-x2t27-8lt7l-1whv2"
spec:
  securityContext:
    supplementalGroups: [26]
  serviceAccountName: "jenkins"
  containers:
  - name: "processing-db"
    image: "crunchydata/crunchy-postgres-gis:centos8-13.3-3.1-4.6.3"
    env:
    - name: "MODE"
      value: "postgres"
    - name: "PG_DATABASE"
      value: "processing"
    - name: "PG_LOCALE"
      value: "en_US.UTF-8"
    - name: "PG_PRIMARY_PORT"
      value: "5432"
    - name: "PG_MODE"
      value: "primary"
    - name: "PG_USER"
      value: "user"
    - name: "PG_PASSWORD"
      value: "pass"
    - name: "PG_PRIMARY_USER"
      value: "repl"
    - name: "PG_PRIMARY_PASSWORD"
      value: "repl"
    - name: "PG_ROOT_PASSWORD"
      value: "secret"
  - name: "jnlp"
    image: "sogis/gretl:latest"
    args:
    - "********"
    - "processing-db-37-x2t27-8lt7l-1whv2"
    env:
    - name: "JENKINS_SECRET"
      value: "********"
    - name: "JENKINS_TUNNEL"
      value: "jenkins-jnlp.gretl-test.svc.cluster.local:50000"
    - name: "JENKINS_AGENT_NAME"
      value: "processing-db-37-x2t27-8lt7l-1whv2"
    - name: "JENKINS_NAME"
      value: "processing-db-37-x2t27-8lt7l-1whv2"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/tmp"
    - name: "JENKINS_URL"
      value: "http://jenkins:80/"
    imagePullPolicy: "Always"
    tty: false
  nodeSelector:
    kubernetes.io/os: "linux"
  restartPolicy: "Never"
jkatz commented 2 years ago

@schmandr Cool :+1: Glad it worked!