Open bjwschaap opened 6 years ago
I think I found a cause. Describing one of the DB pods:
Name: auth-db-77cc7868cd-4klf7
Namespace: fabric8
Node: ip-1.2.3.4.eu-west-1.compute.internal/1.2.3.4
Start Time: Wed, 28 Feb 2018 10:17:14 +0100
Labels: app=auth-db
group=io.fabric8.platform.apps
pod-template-hash=3377342478
provider=fabric8
service=auth-db
version=4.0.208
Annotations: fabric8.io/git-branch=release-v4.0.208
fabric8.io/git-commit=d537a75a59f2305791c3e5adc838cb04f0329b18
fabric8.io/metrics-path=dashboard/file/kubernetes-pods.json/?var-project=auth-db&var-version=4.0.208
fabric8.io/scm-con-url=scm:git:git@github.com:fabric8io/fabric8-platform.git/apps/auth-db
fabric8.io/scm-devcon-url=scm:git:git@github.com:fabric8io/fabric8-platform.git/apps/auth-db
fabric8.io/scm-tag=app-console-2.0.1
fabric8.io/scm-url=http://github.com/fabric8io/fabric8-platform/apps/auth-db
kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"fabric8","name":"auth-db-77cc7868cd","uid":"2a13fb31-1c68-11e8-b7a2-06536a339792"...
maven.fabric8.io/source-url=jar:file:/home/jenkins/workspace/8io_fabric8-platform_master-4P5FOSFKYBLAPGDO7GHHNEOGKKERYH26KXBFORI5V7MRVJFY3QWA/apps/auth-db/target/auth-db-4.0.208.jar!/META-INF/fabric8/...
Status: Running
IP: 10.20.30.40
Controlled By: ReplicaSet/auth-db-77cc7868cd
Containers:
auth-db:
Container ID: docker://a630b9a96404ef1a011eadfc7a91a16bff9399233e58e831be6bffdaa95a070d
Image: registry.centos.org/postgresql/postgresql:9.6
Image ID: docker-pullable://registry.centos.org/postgresql/postgresql@sha256:cc6a0b71015a25a7aa682d378f845d915f07c021b98b92d351cdca1fe091b0ef
Port: 5432/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 28 Feb 2018 16:22:06 +0100
Finished: Wed, 28 Feb 2018 16:22:06 +0100
Ready: False
Restart Count: 76
Liveness: exec [sh -c exec pg_isready --host $POD_IP] delay=60s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [sh -c exec pg_isready --host $POD_IP] delay=20s timeout=3s period=5s #success=1 #failure=3
Environment:
POSTGRESQL_ADMIN_PASSWORD: <set to the key 'db.password' in secret 'auth'> Optional: false
POD_IP: (v1:status.podIP)
Mounts:
/var/lib/pgsql from auth-db-postgresql-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-p587p (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
auth-db-postgresql-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: auth-db-postgresql-data
ReadOnly: false
default-token-p587p:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-p587p
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedSync 4m (x1688 over 6h) kubelet, ip-1.2.3.4.eu-west-1.compute.internal Error syncing pod
Looks like the PVC is mounted at /var/lib/pgsql
.
Doing some manual tests with a PVC and an ubuntu container, it seems like volumes are always mounted as root
in Kubernetes pods/containers.
The entrypoint script in the postgresql container is trying to create a passwd
file in /var/lib/pgsql
(see: https://github.com/sclorg/postgresql-container/blob/master/latest/root/usr/share/container-scripts/postgresql/common.sh#L144 ; this function is called from: https://github.com/sclorg/postgresql-container/blob/master/latest/root/usr/bin/run-postgresql#L12), and does so as user 26
(postgres).
I observed that there is a RHEL and a Centos version of the Dockerfile. The RHEL version of the Dockerfile makes sure that the postgres user is part of the root
group (see: https://github.com/sclorg/postgresql-container/blob/master/latest/Dockerfile.rhel7#L76).
However, the Centos version does not.
My conclusion would be that, looking at the documentation from the postgres docker image, the volume for the DB pods are incorrectly mounted at /var/lib/pgsql
. They should be mounted at: /var/lib/pgsql/data
.
I'll try and figure out if I can make a PR for this.
Fix to make it work: fabric8io/fabric8-platform#1671
I have created a kubernetes 1.8.6 cluster on AWS with
kops
, and enabled RBAC. I'm trying to install fabric8 on this cluster. I've tried using the Helm chart, but this seems to miss all required RBAC settings. So I've tried installing fabric8 usingfabric8 deploy
.This all seems to work okay, except for all DB pods (postgres) failing with:
Deployed with:
Version:
Running/starting Pods:
Log of a failing DB pod:
Any guidance/assistance on how to fix this is much appreciated. I'd be happy to provide any additional information needed.