go-graphite / go-carbon

Golang implementation of Graphite/Carbon server with classic architecture: Agent -> Cache -> Persister
MIT License
803 stars 123 forks source link

Carbon-server going frequent OOM #280

Open logan596 opened 5 years ago

logan596 commented 5 years ago

Hello, we're running a graphite production cluster based on a carbon-c-relay frontend and go-carbon as a backend. The cluster is having 3 go-carbon nodes running with 8GB RAM and 4CPU's. The problem we've experienced was an out of memory, followed by the OOM intervention on a go-carbon node, presumably caused by the carbonserver component.( go-carbon invoked oom-killer & carbon-c-relay invoked oom-killer)

Node Info: [US_POD3 root@gmoncache03 log]# free -h total used free shared buff/cache available Mem: 7.6G 2.4G 161M 17M 5.1G 4.4G Swap: 2.0G 835M 1.2G

OOM intervention:

[US_POD3 root@gmoncache03 log]# cat messages |grep -i oom
2019-02-27T03:15:55+00:00 gmoncache03 kernel: control.pl invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
2019-02-27T03:15:55+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T03:15:55+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T03:34:09+00:00 gmoncache03 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T03:34:09+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T03:34:09+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T03:55:38+00:00 gmoncache03 kernel: go-carbon invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T03:55:38+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T03:55:38+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T04:13:09+00:00 gmoncache03 kernel: go-carbon invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T04:13:09+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T04:13:09+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T04:26:31+00:00 gmoncache03 kernel: vmtoolsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T04:26:31+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T04:26:31+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T04:49:33+00:00 gmoncache03 kernel: go-carbon invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T04:49:33+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T04:49:33+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T05:24:26+00:00 gmoncache03 kernel: interfaces.sh invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
2019-02-27T05:24:26+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T05:24:26+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T06:06:29+00:00 gmoncache03 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T06:06:29+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T06:06:29+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T06:45:24+00:00 gmoncache03 kernel: go-carbon invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T06:45:24+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T06:45:24+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T07:23:20+00:00 gmoncache03 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T07:23:20+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T07:23:20+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T07:58:42+00:00 gmoncache03 kernel: carbon-c-relay invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T07:58:42+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T07:58:42+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T08:31:10+00:00 gmoncache03 kernel: go-carbon invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T08:31:10+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T08:31:10+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T08:31:10+00:00 gmoncache03 kernel: syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T08:31:10+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T08:31:10+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T09:14:40+00:00 gmoncache03 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T09:14:40+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T09:14:41+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T09:14:41+00:00 gmoncache03 kernel: carbon-c-relay invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T09:14:41+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T09:14:41+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T09:14:41+00:00 gmoncache03 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T09:14:41+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T09:14:41+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T10:00:00+00:00 gmoncache03 kernel: control.pl invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
2019-02-27T10:00:00+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T10:00:00+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T10:35:37+00:00 gmoncache03 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T10:35:37+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T10:35:37+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T11:06:17+00:00 gmoncache03 kernel: df.sh invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
2019-02-27T11:06:17+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T11:06:17+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T11:34:01+00:00 gmoncache03 kernel: vmtoolsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T11:34:01+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T11:34:01+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
2019-02-27T11:34:01+00:00 gmoncache03 kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2019-02-27T11:34:01+00:00 gmoncache03 kernel: [<ffffffff83fba4e4>] oom_kill_process+0x254/0x3d0
2019-02-27T11:34:01+00:00 gmoncache03 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[US_POD3 root@gmoncache03 log]#

Go-carbon version is 0.12.0. carbon-c-relay v3.3 (2018-04-13)

Any help or any suggestion on this. Many thanks in Advance, Yogesh

deniszh commented 5 years ago

@logan596 : try to decrease max-size in cache section (it it's really go-carbon consumes memory) and not e.g. carbon-c-relay (which is also possible). And/or increase number of servers, probably your load is too high.

azhiltsov commented 5 years ago

You can also disable carbonserver (if not being used) or try to disable a trigram index in it. But as a general course of action I would suggest to track the processes memory consumption before you start turning the knobs. There is not only go-carbon in your log output.

splunkd invoked oom-killer vmtoolsd invoked oom-killer syslog-ng invoked oom-killer