Decipher CPU usage metrics

jchauncey commented 8 years ago

see https://trello.com/c/6lYjhOK4/97-decipher-difference-between-seconds-and-on-the-cpu-metrics

Currently we receive CPU metrics as usage in seconds which is not what kubernetes expects when applying CPU limits on a pod. This issue aims to determine the following things

[x] Can usage in seconds be easily converted to the milli-core designation?
[x] What exactly does usage in seconds represent? Is it per core? Per CPU?

jchauncey commented 8 years ago

Interesting discussion on CPU metrics here - https://github.com/kubernetes/kubernetes/issues/24925

jchauncey commented 8 years ago

The metrics we currently collect are decorated docker metrics. If we want to collect metrics that are more meaningful to kubernetes we need to get them from heapster.

https://github.com/kubernetes/heapster/blob/f212f087673b3df8b73c5180eeb08ab62e6f69e0/metrics/api/v1/model_handlers.go

jchauncey commented 7 years ago

curl heapsterip/api/v1/model/namespaces/deis/pods/deis-logger-fluentd-bp2pz/metrics/cpu/usage_rate

jchauncey commented 7 years ago

So the metrics we collect on CPU usage come from cadvisor and cannot be easily translated into millicores/nanocores which is what we would like to have. Instead, we should ask for the data directly from kubelet using the following url:

curl 10.240.0.6:10255/stats/summary

The IP here is that of the node and 10255 is the kubelet port.

The information from that endpoint looks like this:

{
  "node": {
   "nodeName": "gke-jchauncey-default-pool-625a6421-d11j",
   "systemContainers": [
    {
     "name": "kubelet",
     "startTime": "2016-08-25T18:46:30Z",
     "cpu": {
      "time": "2016-09-13T16:27:42Z",
      "usageNanoCores": 29419252,
      "usageCoreNanoSeconds": 62774632874586
     },
     "memory": {
      "time": "2016-09-13T16:27:42Z",
      "usageBytes": 48091136,
      "workingSetBytes": 47968256,
      "rssBytes": 38694912,
      "pageFaults": 2473770454,
      "majorPageFaults": 10
     },
     "rootfs": {
      "availableBytes": 88619929600,
      "capacityBytes": 105553100800
     },
     "logs": {
      "availableBytes": 88619929600,
      "capacityBytes": 105553100800
     },
     "userDefinedMetrics": null
    },
    {
     "name": "runtime",
     "startTime": "2016-08-25T18:47:12Z",
     "cpu": {
      "time": "2016-09-13T16:27:42Z",
      "usageNanoCores": 9757499,
      "usageCoreNanoSeconds": 21980076117778
     },
     "memory": {
      "time": "2016-09-13T16:27:42Z",
      "usageBytes": 3461898240,
      "workingSetBytes": 1818386432,
      "rssBytes": 35901440,
      "pageFaults": 1117989,
      "majorPageFaults": 2
     },
     "rootfs": {
      "availableBytes": 88619929600,
      "capacityBytes": 105553100800
     },
     "logs": {
      "availableBytes": 88619929600,
      "capacityBytes": 105553100800
     },
     "userDefinedMetrics": null
    },
    {
     "name": "misc",
     "startTime": "2016-08-25T18:46:30Z",
     "cpu": {
      "time": "2016-09-13T16:27:36Z",
      "usageNanoCores": 8070501,
      "usageCoreNanoSeconds": 15748925965152
     },
     "memory": {
      "time": "2016-09-13T16:27:36Z",
      "usageBytes": 2415579136,
      "workingSetBytes": 1095823360,
      "rssBytes": 63426560,
      "pageFaults": 728828122,
      "majorPageFaults": 781
     },
     "rootfs": {
      "availableBytes": 88619929600,
      "capacityBytes": 105553100800
     },
     "logs": {
      "availableBytes": 88619929600,
      "capacityBytes": 105553100800
     },
     "userDefinedMetrics": null
    }
   ],
   "startTime": "2016-08-25T18:46:30Z",
   "cpu": {
    "time": "2016-09-13T16:27:34Z",
    "usageNanoCores": 963229294,
    "usageCoreNanoSeconds": 1026654779272269
   },
   "memory": {
    "time": "2016-09-13T16:27:34Z",
    "availableBytes": 11267112960,
    "usageBytes": 8335286272,
    "workingSetBytes": 4540813312,
    "rssBytes": 36007936,
    "pageFaults": 354054,
    "majorPageFaults": 1189
   },
   "network": {
    "time": "2016-09-13T16:27:34Z",
    "rxBytes": 139336199722,
    "rxErrors": 0,
    "txBytes": 164374896525,
    "txErrors": 0
   },
   "fs": {
    "availableBytes": 88619929600,
    "capacityBytes": 105553100800,
    "usedBytes": 12514336768
   },
   "runtime": {
    "imageFs": {
     "availableBytes": 88619929600,
     "capacityBytes": 105553100800,
     "usedBytes": 4656415667
    }
   }
  },
  "pods": [
   {
    "podRef": {
     "name": "deis-registry-proxy-nzoby",
     "namespace": "deis",
     "uid": "e7f0b947-75df-11e6-9ffa-42010af001a3"
    },
    "startTime": "2016-09-08T16:18:41Z",
    "containers": [
     {
      "name": "deis-registry-proxy",
      "startTime": "2016-09-08T16:18:47Z",
      "cpu": {
       "time": "2016-09-13T16:27:45Z",
       "usageNanoCores": 187097,
       "usageCoreNanoSeconds": 73376535232
      },
      "memory": {
       "time": "2016-09-13T16:27:45Z",
       "availableBytes": 49102848,
       "usageBytes": 3346432,
       "workingSetBytes": 3325952,
       "rssBytes": 3215360,
       "pageFaults": 2651,
       "majorPageFaults": 0
      },
      "rootfs": {
       "availableBytes": 88619929600,
       "capacityBytes": 105553100800,
       "usedBytes": 81920
      },
      "logs": {
       "availableBytes": 88619929600,
       "capacityBytes": 105553100800,
       "usedBytes": 24576
      },
      "userDefinedMetrics": null
     }
    ],
    "network": {
     "time": "2016-09-13T16:27:45Z",
     "rxBytes": 7057,
     "rxErrors": 0,
     "txBytes": 1858,
     "txErrors": 0
    },
    "volume": [
     {
      "availableBytes": 7903948800,
      "capacityBytes": 7903961088,
      "usedBytes": 12288,
      "name": "default-token-jgo47"
     }
    ]
   },

The problem here is that this information will not be easily translated using telegraf on its own.

There are two possible solutions:

1) Write a plugin for telegraf that can take this data from the kubelet and parse it into a better structure and send it to influx. This has the benefit of us moving off of the prometheus endpoint in the kube api and instead asking the kubelet directly for the information. This will be more performant for larger clusters.

2) We write a custom go binary that we run within the telegraf pod that parses this data and inserts it into the nsq topic for metrics. This has similar benefits as the above plugin does for performance reasons but it does mean we have to maintain a custom binary.

/cc @sstarcher @slack @mboersma @helgi @sgoings

jchauncey commented 7 years ago

So I just had a conversation with @helgi and @mboersma about this particular issue. We have decided that it is probably best if we go with option 1. This gives us the benefit of having a simpler pod architecture while also giving the telegraf community a useful plugin. I will open an issue in the telegraf repo to track part of this work and to open a line of communication with that team.

deis / monitor

Decipher CPU usage metrics #139