Closed jinnko closed 3 years ago
The ping library used here has some stats tracking features that we don't use, since that's what Prometheus if for. :grin:
This bloats the memory, but there's currently no way to turn it off. There's an open issue upstream in the ping library to fix this, but we're in the process of trying to migrate the ping library to a new ownership so changes can be made.
So, TL;DR, I'm aware of the problem, but waiting on https://github.com/sparrc/go-ping/issues/90
@towolf Can you post any debugging info here?
@superQ, are you sure this is fixed?
This is the metric container_memory_working_set_bytes
for the smokeping_prober version quay.io/superq/smokeping-prober-linux-amd64:v0.6.0
.
It got killed after little more than 48 hours, pinging about 15 targets:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 01 Jun 2022 11:26:48 +0200
Finished: Fri, 03 Jun 2022 12:06:08 +0200
Restart Count: 1
Limits:
memory: 64Mi
I will look into giving you more information next week. If you have easy instructions on how do get the needed information in the running container running in Kubernetes, that would be appreciated.
Is this right?
$ go tool pprof http://localhost:9374/debug/pprof/heap
Fetching profile over HTTP from http://localhost:9374/debug/pprof/heap
Saved profile in /home/niwolf/pprof/pprof.smokeping_prober.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz
File: smokeping_prober
Type: inuse_space
Time: Jun 3, 2022 at 4:13pm (CEST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 50
Showing nodes accounting for 5977.94kB, 100% of 5977.94kB total
flat flat% sum% cum cum%
4050.65kB 67.76% 67.76% 4050.65kB 67.76% github.com/go-ping/ping.(*Pinger).updateStatistics
902.59kB 15.10% 82.86% 902.59kB 15.10% compress/flate.NewWriter
512.50kB 8.57% 91.43% 512.50kB 8.57% runtime.allocm
512.20kB 8.57% 100% 512.20kB 8.57% runtime.malg
0 0% 100% 902.59kB 15.10% bufio.(*Writer).Flush
0 0% 100% 902.59kB 15.10% compress/gzip.(*Writer).Write
0 0% 100% 4050.65kB 67.76% github.com/go-ping/ping.(*Pinger).processPacket
$ go tool pprof -top -alloc_space http://localhost:9374/debug/pprof/heap
Fetching profile over HTTP from http://localhost:9374/debug/pprof/heap
Saved profile in /home/niwolf/pprof/pprof.smokeping_prober.alloc_objects.alloc_space.inuse_objects.inuse_space.018.pb.gz
File: smokeping_prober
Type: alloc_space
Time: Jun 3, 2022 at 4:29pm (CEST)
Showing nodes accounting for 4226.70MB, 95.28% of 4436.26MB total
Dropped 159 nodes (cum <= 22.18MB)
flat flat% sum% cum cum%
1049.04MB 23.65% 23.65% 1394.56MB 31.44% golang.org/x/net/internal/socket.(*Conn).recvMsg
753.04MB 16.97% 40.62% 2353.11MB 53.04% golang.org/x/net/ipv4.(*payloadHandler).ReadFrom
507.53MB 11.44% 52.06% 2860.64MB 64.48% github.com/go-ping/ping.(*Pinger).recvICMP
391.52MB 8.83% 60.89% 391.52MB 8.83% golang.org/x/net/icmp.parseEcho
322.60MB 7.27% 68.16% 411.64MB 9.28% compress/flate.NewWriter
203.01MB 4.58% 72.74% 904.76MB 20.39% github.com/go-ping/ping.(*Pinger).processPacket
190.51MB 4.29% 77.03% 190.51MB 4.29% golang.org/x/net/internal/socket.parseInetAddr
155.01MB 3.49% 80.52% 155.01MB 3.49% net.(*rawConn).Read
152.51MB 3.44% 83.96% 544.03MB 12.26% golang.org/x/net/icmp.ParseMessage
119.50MB 2.69% 86.66% 119.50MB 2.69% golang.org/x/net/ipv4.NewControlMessage
86MB 1.94% 88.59% 86MB 1.94% golang.org/x/net/internal/socket.ControlMessage.Parse
73.68MB 1.66% 90.26% 73.68MB 1.66% compress/flate.(*compressor).initDeflate (inline)
67.51MB 1.52% 91.78% 145.02MB 3.27% main.NewSmokepingCollector.func1
50.01MB 1.13% 92.90% 50.01MB 1.13% github.com/go-kit/log.(*context).Log
32MB 0.72% 93.63% 92.50MB 2.09% github.com/go-ping/ping.(*Pinger).sendICMP
26.50MB 0.6% 94.22% 26.50MB 0.6% github.com/prometheus/client_golang/prometheus.(*histogram).Write
15.50MB 0.35% 94.57% 30MB 0.68% golang.org/x/net/icmp.(*Message).Marshal
11.17MB 0.25% 94.82% 57.17MB 1.29% github.com/prometheus/client_golang/prometheus.(*Registry).Gather
10.50MB 0.24% 95.06% 44.50MB 1.00% github.com/prometheus/client_golang/prometheus.processMetric
8.48MB 0.19% 95.25% 89.04MB 2.01% compress/flate.(*compressor).init
1.06MB 0.024% 95.28% 28.48MB 0.64% runtime/pprof.writeHeapInternal
0 0% 95.28% 387.83MB 8.74% bufio.(*Writer).Flush
0 0% 95.28% 411.64MB 9.28% compress/gzip.(*Writer).Write
0 0% 95.28% 2860.64MB 64.48% github.com/go-ping/ping.(*Pinger).run.func1
0 0% 95.28% 997.26MB 22.48% github.com/go-ping/ping.(*Pinger).run.func2
@SuperQ I have taken a look at this and found out that you only applied the RecordRtts = false setting to hosts configured via CLI, not via SafeConfig/dynamic configuration. I have created a PR to insert the second occurrence.
I have several systems running smokeping prober and I've observed that long running instances are taking up significantly more memory. Is this a sign of a memory leak?
As an example:
This is an instance running in an LXC container that has been up for 2 weeks.
And this is an instance that was restarted 5m ago because it had similar memory usage to the one above after 2 weeks.
Both are the same version:
How can I help find the issue?