fabric8io / fabric8

fabric8 is an open source microservices platform based on Docker, Kubernetes and Jenkins
http://fabric8.io/
1.76k stars 504 forks source link

fabric8 deploy on kubernetes 1.8.6 AWS fails #7027

Open bjwschaap opened 6 years ago

bjwschaap commented 6 years ago

I have created a kubernetes 1.8.6 cluster on AWS with kops, and enabled RBAC. I'm trying to install fabric8 on this cluster. I've tried using the Helm chart, but this seems to miss all required RBAC settings. So I've tried installing fabric8 using fabric8 deploy.

This all seems to work okay, except for all DB pods (postgres) failing with:

/usr/share/container-scripts/postgresql/common.sh: line 127: /var/lib/pgsql/passwd: Permission denied

Deployed with:

gofabric8 deploy -d f8.dev.mydomain.com --exposer LoadBalancer --loadbalancer --namespace fabric8 --tls-acme-email bastiaan@mydomain.com

Version:

gofabric8 version
foo' v3.7.1+ab0f056'
gofabric8, version 0.4.176 (branch: 'unknown', revision: 'homebrew')
  build date:       '20171110-18:14:19'
  go version:       '1.9.2'
  oc version:       'v3.7.1+ab0f056'
  Remote URL:       'https://api.dev.mydomain.com'

  Remote Kubernetes:       'v1.8.6'

Running/starting Pods:

kubectl get po -n fabric8
NAME                                   READY     STATUS             RESTARTS   AGE
auth-65cf86bbb4-hqnnb                  0/1       CrashLoopBackOff   13         30m
auth-db-77cc7868cd-4klf7               0/1       CrashLoopBackOff   10         30m
che-starter-56579b55d-rp86h            1/1       Running            0          30m
configmapcontroller-6c4fc5568f-7d4qv   1/1       Running            0          30m
docker-registry-56f8dc66db-br7bg       1/1       Running            0          30m
exposecontroller-564d8fcd55-4f2sw      1/1       Running            0          30m
fabric8-694998ddb8-kxcww               0/1       CrashLoopBackOff   10         30m
forge-56fbbf9b7b-g6d69                 1/1       Running            0          30m
init-tenant-75f4d89d6-xkf9m            0/1       CrashLoopBackOff   13         30m
init-tenant-db-8b7b5856c-mvv9t         0/1       CrashLoopBackOff   10         30m
keycloak-84c6cc698b-4vn7q              0/1       CrashLoopBackOff   9          30m
keycloak-db-59b8b4f645-cv6lx           0/1       CrashLoopBackOff   10         30m
wit-7bb7888d88-cxcsw                   0/1       CrashLoopBackOff   13         30m
wit-db-6756b4ccf7-7jhbt                0/1       CrashLoopBackOff   10         30m

Log of a failing DB pod:

kubectl logs auth-db-77cc7868cd-4klf7 -n fabric8
/usr/share/container-scripts/postgresql/common.sh: line 127: /var/lib/pgsql/passwd: Permission denied

Any guidance/assistance on how to fix this is much appreciated. I'd be happy to provide any additional information needed.

bjwschaap commented 6 years ago

I think I found a cause. Describing one of the DB pods:

Name:           auth-db-77cc7868cd-4klf7
Namespace:      fabric8
Node:           ip-1.2.3.4.eu-west-1.compute.internal/1.2.3.4
Start Time:     Wed, 28 Feb 2018 10:17:14 +0100
Labels:         app=auth-db
                group=io.fabric8.platform.apps
                pod-template-hash=3377342478
                provider=fabric8
                service=auth-db
                version=4.0.208
Annotations:    fabric8.io/git-branch=release-v4.0.208
                fabric8.io/git-commit=d537a75a59f2305791c3e5adc838cb04f0329b18
                fabric8.io/metrics-path=dashboard/file/kubernetes-pods.json/?var-project=auth-db&var-version=4.0.208
                fabric8.io/scm-con-url=scm:git:git@github.com:fabric8io/fabric8-platform.git/apps/auth-db
                fabric8.io/scm-devcon-url=scm:git:git@github.com:fabric8io/fabric8-platform.git/apps/auth-db
                fabric8.io/scm-tag=app-console-2.0.1
                fabric8.io/scm-url=http://github.com/fabric8io/fabric8-platform/apps/auth-db
                kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"fabric8","name":"auth-db-77cc7868cd","uid":"2a13fb31-1c68-11e8-b7a2-06536a339792"...
                maven.fabric8.io/source-url=jar:file:/home/jenkins/workspace/8io_fabric8-platform_master-4P5FOSFKYBLAPGDO7GHHNEOGKKERYH26KXBFORI5V7MRVJFY3QWA/apps/auth-db/target/auth-db-4.0.208.jar!/META-INF/fabric8/...
Status:         Running
IP:             10.20.30.40
Controlled By:  ReplicaSet/auth-db-77cc7868cd
Containers:
  auth-db:
    Container ID:   docker://a630b9a96404ef1a011eadfc7a91a16bff9399233e58e831be6bffdaa95a070d
    Image:          registry.centos.org/postgresql/postgresql:9.6
    Image ID:       docker-pullable://registry.centos.org/postgresql/postgresql@sha256:cc6a0b71015a25a7aa682d378f845d915f07c021b98b92d351cdca1fe091b0ef
    Port:           5432/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 28 Feb 2018 16:22:06 +0100
      Finished:     Wed, 28 Feb 2018 16:22:06 +0100
    Ready:          False
    Restart Count:  76
    Liveness:       exec [sh -c exec pg_isready --host $POD_IP] delay=60s timeout=5s period=10s #success=1 #failure=6
    Readiness:      exec [sh -c exec pg_isready --host $POD_IP] delay=20s timeout=3s period=5s #success=1 #failure=3
    Environment:
      POSTGRESQL_ADMIN_PASSWORD:  <set to the key 'db.password' in secret 'auth'>  Optional: false
      POD_IP:                      (v1:status.podIP)
    Mounts:
      /var/lib/pgsql from auth-db-postgresql-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-p587p (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  auth-db-postgresql-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  auth-db-postgresql-data
    ReadOnly:   false
  default-token-p587p:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-p587p
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason      Age                 From                                                  Message
  ----     ------      ----                ----                                                  -------
  Warning  FailedSync  4m (x1688 over 6h)  kubelet, ip-1.2.3.4.eu-west-1.compute.internal  Error syncing pod

Looks like the PVC is mounted at /var/lib/pgsql. Doing some manual tests with a PVC and an ubuntu container, it seems like volumes are always mounted as root in Kubernetes pods/containers. The entrypoint script in the postgresql container is trying to create a passwd file in /var/lib/pgsql (see: https://github.com/sclorg/postgresql-container/blob/master/latest/root/usr/share/container-scripts/postgresql/common.sh#L144 ; this function is called from: https://github.com/sclorg/postgresql-container/blob/master/latest/root/usr/bin/run-postgresql#L12), and does so as user 26 (postgres). I observed that there is a RHEL and a Centos version of the Dockerfile. The RHEL version of the Dockerfile makes sure that the postgres user is part of the root group (see: https://github.com/sclorg/postgresql-container/blob/master/latest/Dockerfile.rhel7#L76). However, the Centos version does not.

My conclusion would be that, looking at the documentation from the postgres docker image, the volume for the DB pods are incorrectly mounted at /var/lib/pgsql. They should be mounted at: /var/lib/pgsql/data.

I'll try and figure out if I can make a PR for this.

bjwschaap commented 6 years ago

Fix to make it work: fabric8io/fabric8-platform#1671