dtomcej / meetup2018

E2E Encryption using Traefik, Boulder, and MiniCA
GNU General Public License v3.0
9 stars 4 forks source link

Boulder Readiness probe fail #1

Open CatCassie opened 6 years ago

CatCassie commented 6 years ago

Hi,

I tried the steps in the bootstrap.sh script provided to create k8s deployments for boulder, traefik and whoami. I encountered: "Readiness probe failed: Get http://10.200.88.30:4001/directory: dial tcp 10.200.88.30:4001: getsockopt: connection refused" from the boulder pod.

Below are the actions I tool to debug the issue:

I exec into the boulder pod and "curl http://10.200.88.30:4001/directory", got below response: { "B0PymtCHU_I": "https://community.letsencrypt.org/t/adding-random-entries-to-the-directory/33417", "keyChange": "http://10.200.88.30:4001/acme/key-change", "meta": { "caaIdentities": [ "happy-hacker-ca.invalid" ], "termsOfService": "https://boulder:4431/terms/v7", "website": "https://github.com/letsencrypt/boulder" }, "newAccount": "http://10.200.88.30:4001/acme/new-acct", "newNonce": "http://10.200.88.30:4001/acme/new-nonce", "newOrder": "http://10.200.88.30:4001/acme/new-order", "revokeCert": "http://10.200.88.30:4001/acme/revoke-cert" }

then I exec into the boulder-hsm pod and curl the same endpoint, I got the same response.

I wonder if I got readiness probe fail issue because the certificate is invalid. One thing to mentioned ... when I generate the certs ... I change the passphrase to something else.

Can you please help me with this issue?

dtomcej commented 6 years ago

Readiness probe failed: Get http://10.200.88.30:4001/directory: dial tcp 10.200.88.30:4001: getsockopt: connection refused Just means that the pod is not ready yet.

The readiness check does not use an https endpoint, so it shouldn't matter.

What do your boulder logs look like?

Also, the code has some timeouts set: https://github.com/dtomcej/meetup2018/blob/master/boulder/6.boulder.deployment.yml#L63

Try tweaking them to fit your needs. It may be that the boulder pod takes time to start up. In my dev environment, it takes almost a full minute to start up.

CatCassie commented 6 years ago

Thank you for your response. We found the issue in the logs of the init container. The service account did not have enough permission to access the boulder pod. Since we are using ClusterRoleBinding so under the resources field, we added: - pods. The deployment worked now.

I have a different question. I am assume in readme, it is suggested to map traefik.rocks to localhost ... I guess that is because the demo is using minikube. If not uising minikube, should I map traefik.rocks to the External IP of the traefik pod (if using load balancer as the service type of traefik)?

Thank you,

CatCassie commented 6 years ago

@dtomcej I am using the loadBalancer as the service type for traefik and in my /etc/hosts files map the external IP of traefik svc to the hostnames of all the application that's using ingress. But if I have many applications, putting all of them in the hosts file is probably not a good option. Do you have any recommendations?