elotl / kip

Virtual-kubelet provider running pods in cloud instances
Apache License 2.0
223 stars 14 forks source link

GKE: Symlinked kubelet client certs need to be resolved #75

Closed myechuri closed 4 years ago

myechuri commented 4 years ago

GKE worker node's pki certs are symlinks:

madhuri@gke-myechuri-vk-gke-test-default-pool-db7c47b8-fw89 ~ $ ls -ls /var/lib/kubelet/pki
total 8
4 -rw------- 1 root root 1110 May  5 06:52 kubelet-client-2020-05-05-06-52-14.pem
0 lrwxrwxrwx 1 root root   59 May  5 06:52 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2020-
05-05-06-52-14.pem
4 -rw------- 1 root root 1252 May  5 06:52 kubelet-server-2020-05-05-06-52-16.pem
0 lrwxrwxrwx 1 root root   59 May  5 06:52 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2020-
05-05-06-52-16.pem
madhuri@gke-myechuri-vk-gke-test-default-pool-db7c47b8-fw89 ~ $

Using kubelet-client-current.pem as cert location did not work. overlays/gke/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: virtual-kubelet
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - command:
        - /virtual-kubelet
        - --provider
        - kip
        - --provider-config
        - /etc/virtual-kubelet/provider.yaml
        - --network-agent-secret
        - kube-system/vk-network-agent
        - --disable-taint
        - --klog.logtostderr
        - --klog.v=2
        image: elotl/virtual-kubelet:v0.0.2-37-gede5647
        name: virtual-kubelet
        env:
        - name: APISERVER_CERT_LOCATION
          value: /etc/kubelet-pki/kubelet-client-current.pem
        - name: APISERVER_KEY_LOCATION
          value: /etc/kubelet-pki/kubelet-client-current.pem

Above deployment results in vk+kip failed with below error:

F0506 06:00:54.493578       1 main.go:110] error loading tls certs: open /etc/kubelet-pki/kubelet-client-current.pem: no such file or directory

Workaround: updating deployment.yaml with below helped me get past the error:

        env:
        - name: APISERVER_CERT_LOCATION
          value: /etc/kubelet-pki/kubelet-client-2020-05-05-06-52-14.pem
        - name: APISERVER_KEY_LOCATION
          value: /etc/kubelet-pki/kubelet-client-2020-05-05-06-52-14.pem
ldx commented 4 years ago

These certs should be the ones generated for serving the kubelet API, not the kubelet client certs.

See also #70 for how to fix this for the long term. Kip should generate its own certificates, not share and reuse existing kubelet certs. It should be pretty easy to fix, but it needs to go in node-cli, since the http server is set up via node-cli.

myechuri commented 4 years ago

These certs should be the ones generated for serving the kubelet API, not the kubelet client certs.

Thanks for clarifying, @ldx . By using /etc/kubelet-pki/kubelet-client-2020-05-05-06-52-14.pem , i am essentially sharing one cert between kubelet where vk runs, and the virtual worker exposed by vk, right? Two followups:

1) Is this the best workaround until #70 is fixed? 2) If answer to above is yes, we would need to figure out a way to supply the right cert name (like /etc/kubelet-pki/kubelet-client-2020-05-05-06-52-14.pem) here and here in overlay files for GKE, right?

ldx commented 4 years ago

kubelet-client-2020-05-05-06-52-14.pem is probably the client cert, for serving its API the kubelet uses the other cert (unless GKE has different naming conventions for the kubelet certs). I think the easiest way would be fixing it as suggested in #70

ldx commented 4 years ago

This has been fixed via d99177f