SuperQ / smokeping_prober

Prometheus style smokeping
Apache License 2.0
575 stars 74 forks source link

Memory leak? #42

Closed jinnko closed 3 years ago

jinnko commented 4 years ago

I have several systems running smokeping prober and I've observed that long running instances are taking up significantly more memory. Is this a sign of a memory leak?

As an example:

This is an instance running in an LXC container that has been up for 2 weeks.

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
21755 10002      20   0  176M 53796     0 S  0.0  2.7  2h24:10 /usr/local/bin/smokeping_prober --web.listen-address=10.255.101.2:9374 1.1.1.1 8.8.8.8 1.0.0.1 8.8.4.4

And this is an instance that was restarted 5m ago because it had similar memory usage to the one above after 2 weeks.

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
27908 smokeping  20   0  110M 12124  7284 S  0.0  0.6  0:00.98 /usr/local/bin/smokeping_prober --web.listen-address=10.255.101.1:9374 1.1.1.1 8.8.8.8 1.0.0.1 8.8.4.4

Both are the same version:

# /usr/local/bin/smokeping_prober --version
smokeping_prober, version 0.3.0 (branch: HEAD, revision: 594bd985ddfac52c473dcd4d290e9a8798406a10)
  build user:       root@100e204324f4
  build date:       20190625-13:48:41
  go version:       go1.12.6

How can I help find the issue?

SuperQ commented 4 years ago

The ping library used here has some stats tracking features that we don't use, since that's what Prometheus if for. :grin:

This bloats the memory, but there's currently no way to turn it off. There's an open issue upstream in the ping library to fix this, but we're in the process of trying to migrate the ping library to a new ownership so changes can be made.

So, TL;DR, I'm aware of the problem, but waiting on https://github.com/sparrc/go-ping/issues/90

SuperQ commented 2 years ago

@towolf Can you post any debugging info here?

towolf commented 2 years ago

@superQ, are you sure this is fixed?

image

This is the metric container_memory_working_set_bytes for the smokeping_prober version quay.io/superq/smokeping-prober-linux-amd64:v0.6.0.

It got killed after little more than 48 hours, pinging about 15 targets:

    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 01 Jun 2022 11:26:48 +0200
      Finished:     Fri, 03 Jun 2022 12:06:08 +0200
    Restart Count:  1
    Limits:
      memory:  64Mi

I will look into giving you more information next week. If you have easy instructions on how do get the needed information in the running container running in Kubernetes, that would be appreciated.

towolf commented 2 years ago

Is this right?

$ go tool pprof http://localhost:9374/debug/pprof/heap
Fetching profile over HTTP from http://localhost:9374/debug/pprof/heap
Saved profile in /home/niwolf/pprof/pprof.smokeping_prober.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz
File: smokeping_prober
Type: inuse_space
Time: Jun 3, 2022 at 4:13pm (CEST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 50
Showing nodes accounting for 5977.94kB, 100% of 5977.94kB total
      flat  flat%   sum%        cum   cum%
 4050.65kB 67.76% 67.76%  4050.65kB 67.76%  github.com/go-ping/ping.(*Pinger).updateStatistics
  902.59kB 15.10% 82.86%   902.59kB 15.10%  compress/flate.NewWriter
  512.50kB  8.57% 91.43%   512.50kB  8.57%  runtime.allocm
  512.20kB  8.57%   100%   512.20kB  8.57%  runtime.malg
         0     0%   100%   902.59kB 15.10%  bufio.(*Writer).Flush
         0     0%   100%   902.59kB 15.10%  compress/gzip.(*Writer).Write
         0     0%   100%  4050.65kB 67.76%  github.com/go-ping/ping.(*Pinger).processPacket
towolf commented 2 years ago
$ go tool pprof -top -alloc_space http://localhost:9374/debug/pprof/heap
Fetching profile over HTTP from http://localhost:9374/debug/pprof/heap
Saved profile in /home/niwolf/pprof/pprof.smokeping_prober.alloc_objects.alloc_space.inuse_objects.inuse_space.018.pb.gz
File: smokeping_prober
Type: alloc_space
Time: Jun 3, 2022 at 4:29pm (CEST)
Showing nodes accounting for 4226.70MB, 95.28% of 4436.26MB total
Dropped 159 nodes (cum <= 22.18MB)
      flat  flat%   sum%        cum   cum%
 1049.04MB 23.65% 23.65%  1394.56MB 31.44%  golang.org/x/net/internal/socket.(*Conn).recvMsg
  753.04MB 16.97% 40.62%  2353.11MB 53.04%  golang.org/x/net/ipv4.(*payloadHandler).ReadFrom
  507.53MB 11.44% 52.06%  2860.64MB 64.48%  github.com/go-ping/ping.(*Pinger).recvICMP
  391.52MB  8.83% 60.89%   391.52MB  8.83%  golang.org/x/net/icmp.parseEcho
  322.60MB  7.27% 68.16%   411.64MB  9.28%  compress/flate.NewWriter
  203.01MB  4.58% 72.74%   904.76MB 20.39%  github.com/go-ping/ping.(*Pinger).processPacket
  190.51MB  4.29% 77.03%   190.51MB  4.29%  golang.org/x/net/internal/socket.parseInetAddr
  155.01MB  3.49% 80.52%   155.01MB  3.49%  net.(*rawConn).Read
  152.51MB  3.44% 83.96%   544.03MB 12.26%  golang.org/x/net/icmp.ParseMessage
  119.50MB  2.69% 86.66%   119.50MB  2.69%  golang.org/x/net/ipv4.NewControlMessage
      86MB  1.94% 88.59%       86MB  1.94%  golang.org/x/net/internal/socket.ControlMessage.Parse
   73.68MB  1.66% 90.26%    73.68MB  1.66%  compress/flate.(*compressor).initDeflate (inline)
   67.51MB  1.52% 91.78%   145.02MB  3.27%  main.NewSmokepingCollector.func1
   50.01MB  1.13% 92.90%    50.01MB  1.13%  github.com/go-kit/log.(*context).Log
      32MB  0.72% 93.63%    92.50MB  2.09%  github.com/go-ping/ping.(*Pinger).sendICMP
   26.50MB   0.6% 94.22%    26.50MB   0.6%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
   15.50MB  0.35% 94.57%       30MB  0.68%  golang.org/x/net/icmp.(*Message).Marshal
   11.17MB  0.25% 94.82%    57.17MB  1.29%  github.com/prometheus/client_golang/prometheus.(*Registry).Gather
   10.50MB  0.24% 95.06%    44.50MB  1.00%  github.com/prometheus/client_golang/prometheus.processMetric
    8.48MB  0.19% 95.25%    89.04MB  2.01%  compress/flate.(*compressor).init
    1.06MB 0.024% 95.28%    28.48MB  0.64%  runtime/pprof.writeHeapInternal
         0     0% 95.28%   387.83MB  8.74%  bufio.(*Writer).Flush
         0     0% 95.28%   411.64MB  9.28%  compress/gzip.(*Writer).Write
         0     0% 95.28%  2860.64MB 64.48%  github.com/go-ping/ping.(*Pinger).run.func1
         0     0% 95.28%   997.26MB 22.48%  github.com/go-ping/ping.(*Pinger).run.func2
icedream commented 2 years ago

@SuperQ I have taken a look at this and found out that you only applied the RecordRtts = false setting to hosts configured via CLI, not via SafeConfig/dynamic configuration. I have created a PR to insert the second occurrence.