coreos / vault-operator

Run and manage Vault on Kubernetes simply and securely
https://coreos.com/blog/introducing-vault-operator-project
Apache License 2.0
759 stars 110 forks source link

etcd Cluster fails to start #353

Open loomsen opened 5 years ago

loomsen commented 5 years ago

Hi guys, thank you for your effort. I've tried diving into this following the example, however, my etcd cluster fails to start.

First it looks like this

kubectl -n default get pods
NAME                              READY     STATUS     RESTARTS   AGE
etcd-operator-779446c7d8-t2hm9    3/3       Running    0          28m
example-etcd-f4rhsm64d4           0/1       Init:0/1   0          9s
vault-operator-7dc8b55b4d-mkz5p   1/1       Running    0          28m

Then it errors out, without providing any output:

 nvarz:~/playground/vault-operator (master *%=)$ kubectl -n default get pods -w
NAME                              READY     STATUS    RESTARTS   AGE
etcd-operator-779446c7d8-t2hm9    3/3       Running   0          29m
example-etcd-9x68tmdxl7           0/1       Error     0          40s
example-etcd-f4rhsm64d4           0/1       Running   0          56s
vault-operator-7dc8b55b4d-mkz5p   1/1       Running   0          29m
^C nvarz:~/playground/vault-operator (master *%=)$ kubectl describe -n default example-etcd-9x68tmdxl7
the server doesn't have a resource type "example-etcd-9x68tmdxl7"
 nvarz:~/playground/vault-operator (master *%=)$ kubectl describe -n default example-etcd-f4rhsm64d4
the server doesn't have a resource type "example-etcd-f4rhsm64d4"

Container Logs of example-etcd-f4rhsm64d4

WARNING: 2018/10/22 17:34:08 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 0.0.0.0:2379: getsockopt: connection refused"; Reconnecting to {0.0.0.0:2379 0  <nil>}
2018-10-22 17:34:08.883317 I | raft: 850a5ffb901b8769 is starting a new election at term 119
2018-10-22 17:34:08.883345 I | raft: 850a5ffb901b8769 became candidate at term 120
2018-10-22 17:34:08.883354 I | raft: 850a5ffb901b8769 received MsgVoteResp from 850a5ffb901b8769 at term 120
2018-10-22 17:34:08.883362 I | raft: 850a5ffb901b8769 [logterm: 2, index: 5] sent MsgVote request to 3922346fe0f3212c at term 120
2018-10-22 17:34:09.883322 I | raft: 850a5ffb901b8769 is starting a new election at term 120
2018-10-22 17:34:09.883353 I | raft: 850a5ffb901b8769 became candidate at term 121
2018-10-22 17:34:09.883361 I | raft: 850a5ffb901b8769 received MsgVoteResp from 850a5ffb901b8769 at term 121
2018-10-22 17:34:09.883369 I | raft: 850a5ffb901b8769 [logterm: 2, index: 5] sent MsgVote request to 3922346fe0f3212c at term 121
2018-10-22 17:34:10.168368 W | rafthttp: health check for peer 3922346fe0f3212c could not connect: dial tcp 10.42.8.6:2380: getsockopt: connection refused
WARNING: 2018/10/22 17:34:10 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 0.0.0.0:2379: getsockopt: connection refused"; Reconnecting to {0.0.0.0:2379 0  <nil>}
2018-10-22 17:34:10.578366 I | etcdserver: skipped leadership transfer for stopping non-leader member
WARNING: 2018/10/22 17:34:10 grpc: addrConn.transportMonitor exits due to: context canceled
2018-10-22 17:34:10.578447 I | rafthttp: stopping peer 3922346fe0f3212c...
2018-10-22 17:34:10.578471 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (writer)
2018-10-22 17:34:10.578485 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (writer)
2018-10-22 17:34:10.578522 I | rafthttp: stopped HTTP pipelining with peer 3922346fe0f3212c
2018-10-22 17:34:10.578533 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (stream MsgApp v2 reader)
2018-10-22 17:34:10.578538 I | rafthttp: stopped streaming with peer 3922346fe0f3212c (stream Message reader)
2018-10-22 17:34:10.578544 I | rafthttp: stopped peer 3922346fe0f3212c

What I end up with

kubectl -n default get pods -w
NAME                              READY     STATUS      RESTARTS   AGE
etcd-operator-779446c7d8-t2hm9    3/3       Running     0          33m
example-668f9f8f7d-76mh7          1/2       Running     0          3m
example-668f9f8f7d-7m4vd          1/2       Running     0          3m
example-668f9f8f7d-n7vgr          1/2       Running     0          3m
example-etcd-9x68tmdxl7           0/1       Error       0          4m
example-etcd-f4rhsm64d4           0/1       Completed   0          4m
vault-operator-7dc8b55b4d-mkz5p   1/1       Running     0          33m

Nothing listed under sealed

kubectl -n default get vault example -o yaml
apiVersion: vault.security.coreos.com/v1alpha1
kind: VaultService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"vault.security.coreos.com/v1alpha1","kind":"VaultService","metadata":{"annotations":{},"name":"example","namespace":"default"},"spec":{"nodes":3,"version":"0.9.1-0"}}
  creationTimestamp: 2018-10-22T17:31:07Z
  generation: 1
  name: example
  namespace: default
  resourceVersion: "2377045"
  selfLink: /apis/vault.security.coreos.com/v1alpha1/namespaces/default/vaultservices/example
  uid: 42467cbb-d620-11e8-8fcd-0050568b2ddd
spec:
  TLS:
    static:
      clientSecret: example-default-vault-client-tls
      serverSecret: example-default-vault-server-tls
  baseImage: quay.io/coreos/vault
  configMapName: ""
  nodes: 3
  version: 0.9.1-0
status:
  clientPort: 8200
  initialized: false
  phase: Running
  serviceName: example
  vaultStatus:
    active: ""
    sealed: null
    standby: null
ledroide commented 5 years ago

Same issue here.

$ kubectl get all,vault,etcd -l 'app in (etcd,vault)' 
NAME                             READY   STATUS    RESTARTS   AGE
pod/vault-svc-6bc5678fcc-l7t2r   1/2     Running   1          24m
pod/vault-svc-6bc5678fcc-znmwj   1/2     Running   1          24m
pod/vault-svc-etcd-9s974hwgj5    1/1     Running   0          25m
pod/vault-svc-etcd-gvschtkphl    0/1     Error     0          25m
NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
service/vault-svc               ClusterIP   10.233.15.50   <none>        8200/TCP,8201/TCP,9102/TCP   24m
service/vault-svc-etcd          ClusterIP   None           <none>        2379/TCP,2380/TCP            25m
service/vault-svc-etcd-client   ClusterIP   10.233.10.40   <none>        2379/TCP                     25m
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/vault-svc   0/2     2            0           24m
replicaset.apps/vault-svc-6bc5678fcc   2         2         0       24m
etcdcluster.etcd.database.coreos.com/vault-svc-etcd   25m

@loomsen : could you solve it ?