giantswarm / prometheus

Kubernetes Setup for Prometheus and Grafana
Apache License 2.0
858 stars 422 forks source link

wait-for-endpoints init-containers fails to load with k8s 1.6.0 #56

Open mbukosky opened 7 years ago

mbukosky commented 7 years ago

Hi,

I just updated to k8s 1.6.0 (via kubeadm) and found that the grafana-import-dashboards job is failing to pick up the kubernetes api.

I am assuming this is because of the new RBAC roles that were added to 1.6 but I am unsure of how to fix this issue or hack around it.

I believe this issue is around this block of code.

      annotations:
        pod.beta.kubernetes.io/init-containers: '[
          {
            "name": "wait-for-endpoints",
            "image": "giantswarm/tiny-tools",
            "imagePullPolicy": "IfNotPresent",
            "command": ["fish", "-c", "echo \"waiting for endpoints...\"; while true; set endpoints (curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt --header \"Authorization: Bearer \"(cat /var/run/secrets/kubernetes.io/serviceaccount/token) https://kubernetes.default.svc/api/v1/namespaces/monitoring/endpoints/grafana); echo $endpoints | jq \".\"; if test (echo $endpoints | jq -r \".subsets[].addresses | length\") -gt 0; exit 0; end; echo \"waiting...\";sleep 1; end"],
            "args": ["monitoring", "grafana"]
          }
        ]'

Here is some debugging information.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T19:15:41Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Pods

$ kubectl -n monitoring get pods
NAME                                  READY     STATUS     RESTARTS   AGE
grafana-core-2777125642-hzj36         1/1       Running    0          6m
grafana-import-dashboards-r0kh8       0/1       Init:0/1   0          6m
kube-state-metrics-3573491037-sr51m   1/1       Running    0          6m
node-directory-size-metrics-3gnkn     2/2       Running    0          6m
node-directory-size-metrics-qh9zk     2/2       Running    0          6m
prometheus-core-4230560888-jqh5r      1/1       Running    0          6m
prometheus-node-exporter-3d4sm        1/1       Running    0          6m
prometheus-node-exporter-hqzdm        1/1       Running    0          6m

logs for the initContainer

kubectl -n monitoring logs grafana-import-dashboards-r0kh8 -c wait-for-endpoints

waiting...
test: Missing argument at index 2
parse error: Invalid numeric literal at line 1, column 5
parse error: Invalid numeric literal at line 1, column 5
waiting...

I am able to hit the endpoint api via dashboard

// 20170407140649
// http://localhost:8001/api/v1/namespaces/monitoring/endpoints/grafana

{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "grafana",
    "namespace": "monitoring",
    "selfLink": "/api/v1/namespaces/monitoring/endpoints/grafana",
    "uid": "xxx",
    "resourceVersion": "5366",
    "creationTimestamp": "2017-04-07T17:57:00Z",
    "labels": {
      "app": "grafana",
      "component": "core"
    }
  },
  "subsets": [
    {
      "addresses": [
        {
          "ip": "xxx",
          "nodeName": "xxx-kube-node-0",
          "targetRef": {
            "kind": "Pod",
            "namespace": "monitoring",
            "name": "grafana-core-2777125642-hzj36",
            "uid": "xxx",
            "resourceVersion": "5363"
          }
        }
      ],
      "ports": [
        {
          "port": 3000,
          "protocol": "TCP"
        }
      ]
    }
  ]
}
mbukosky commented 7 years ago

For reference, I was able to resolve the new 1.6 RBAC reqirements by giving it "god" mode

kubectl create clusterrolebinding add-on-cluster-admin-monitoring --clusterrole=cluster-admin --serviceaccount=monitoring:default

This is not a long term solution but it will work as a hack for now. Could you please provide a better "read-only" RBAC for 1.6?

mbukosky commented 7 years ago

FYI I also believe this issue is related to #48

dstroot commented 7 years ago

strange - I am getting:

Error: unknown flag: --clusterrole

liggitt commented 7 years ago

that command is new in kubectl 1.6.0

dstroot commented 7 years ago

Boom - that was it. gcloud components update is your friend. ;)

chapati23 commented 7 years ago

we're also running into this. been debugging for 2 hours now. to me it seems it's just an issue with fish?

because this works

curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt --header \"Authorization: Bearer \"(cat /var/run/secrets/kubernetes.io/serviceaccount/token) https://kubernetes.default.svc/api/v1/namespaces/monitoring/endpoints/grafana

but when i do:

set endpoints (curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt --header \"Authorization: Bearer \"(cat /var/run/secrets/kubernetes.io/serviceaccount/token) https://kubernetes.default.svc/api/v1/namespaces/monitoring/endpoints/grafana); echo $endpoints;

then $endpoints is always empty.

never used fish before, any ideas?

rootsongjc commented 7 years ago

@chapati23 Try this command:

curl -sX GET -H "Authorization:bearer `cat /var/run/secrets/kubernetes.io/serviceaccount/token`" -k https://kubernetes.default/api/v1/namespaces/monitoring/endpoints/grafana

Change it in manifests-all.yaml to

"command": ["fish", "-c", "echo \"waiting for endpoints...\"; while true; set endpoints (curl -sX GET -H \"Authorization:bearer `cat /var/run/secrets/kubernetes.io/serviceaccount/token`\" -k https://kubernetes.default/api/v1/namespaces/monitoring/endpoints/grafana); echo $endpoints | jq \".\"; if test (echo $endpoints | jq -r \".subsets[]?.addresses // [] | length\") -gt 0; exit 0; end; echo \"waiting...\";sleep 1; end"],

There is no need to set ca.crt and if you do that will make an error.

liggitt commented 7 years ago

I wouldn't recommend getting in the habit of using -k in actual checked-in manifests... skipping TLS verification at the same time you're sending a bearer token opens you to MITM attacks