CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.88k stars 585 forks source link

postgres cluster cannot be created successfully and backrest-shared-repo pod reported the permission for key file is 777 #2140

Closed szhang1 closed 3 years ago

szhang1 commented 3 years ago

Which example are you working with?

When I am following the https://access.crunchydata.com/documentation/postgres-operator/latest/quickstart/ and start the first postgres cluster hippo, hippo cannot get up.

What is the current behavior? When I started the first postgres cluster hippo by following the quickstart, the hippo cannot get up.

Current status:

[workstation:~] $ kubectl -n pgo get all
NAME                                              READY   STATUS      RESTARTS   AGE
pod/hippo-backrest-shared-repo-6bcbd5cccd-g9rq6   0/1     Completed   8          23m
pod/pgo-deploy-4qfn5                              0/1     Completed   0          32m
pod/postgres-operator-77dbb576f6-qbrfs            4/4     Running     0          31m

NAME                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/hippo                        ClusterIP   10.31.122.34    <none>        2022/TCP,5432/TCP            23m
service/hippo-backrest-shared-repo   ClusterIP   10.28.153.146   <none>        2022/TCP                     23m
service/postgres-operator            ClusterIP   10.16.22.120    <none>        8443/TCP,4171/TCP,4150/TCP   31m

NAME                                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/hippo                        0/0     0            0           23m
deployment.apps/hippo-backrest-shared-repo   0/1     1            0           23m
deployment.apps/postgres-operator            1/1     1            1           31m

NAME                                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/hippo-74d68c58d5                        0         0         0       23m
replicaset.apps/hippo-backrest-shared-repo-6bcbd5cccd   1         1         0       23m
replicaset.apps/postgres-operator-77dbb576f6            1         1         1       31m

NAME                   COMPLETIONS   DURATION   AGE
job.batch/pgo-deploy   1/1           99s        32m

[workstation:~] $ kubectl -n pgo logs pod/hippo-backrest-shared-repo-6bcbd5cccd-g9rq6
Starting the pgBackRest repo
The pgBackRest repo has been started
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0777 for '/sshd/ssh_host_ed25519_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /sshd/ssh_host_ed25519_key
sshd: no hostkeys available -- exiting.

What is the expected behavior?

It was supposed to get up. I guess that the permission should have been 400?

Other information (e.g. detailed explanation, related issues, etc)

Please tell us about your environment:

If possible please run the following on the kubernetes or OpenShift (oc) commands and provide the result: kubectl describe yourPodName kubectl describe pvc kubectl get nodes kubectl log yourPodName

Appreciate your help. Thank you very much again!

jkatz commented 3 years ago
jkatz commented 3 years ago

Hm...I do see that the defaultMode for the volume on that Secret is too permissive, even though the volume mount is set to readOnly. I'm surprised this hasn't come up before, but it seems to hav "just worked" to date.

I will make a patch. Thanks for reporting.

jkatz commented 3 years ago

This is fixed and being backpatched to all supported versions.

The only guess as to why this is occurring is that there is something in your version or distribution of Kubernetes that is not giving precedence to the fact that the volume mount of the Secret is using readOnly: true. You can work around this by removing the defaultMode value on the sshd volumes on any configuration that has them. You can find those configuration files in the pgo-config ConfigMap which is in the pgo namespace, and edit it with kubectl edit. Once that is fixed, you will have to restart the postgres-operator Pod, and any newly created cluster will have the correct settings.

Thanks again for reporting.