anclrii / Storj-Exporter

Prometheus exporter for monitoring Storj storage nodes
GNU General Public License v3.0
58 stars 19 forks source link

High CPU usage on exporter and node #34

Closed kevinkk525 closed 4 years ago

kevinkk525 commented 4 years ago

With the new update I can see a high cpu usage periodically on the exporter and the storagenode. As soon as I stop the exporter, the CPU usage on the node goes away. grafik grafik

It looks like something is trying to pull information all the time, even though my prometheus configuration only pulls data every 30 seconds.

It's the same for all 3 nodes running on the host.

kevinkk525 commented 4 years ago

This is strange, I just tried all recent version and it is the same with every version (except <=0.25, those throw errors since the API changed). And my first node doesn't throw any errors now so I removed that error from the first post. However, the weird CPU consumption stays. It causes quite some reads (which my zfs has in cache) and 1 cpu core to go to 100% for a short duration. grafik grafik

I thought the exporter only does something when prometheus scrapes it? My scrape is every 30 seconds: grafik

But the cpu spikes in the node occur every 5 seconds.

anclrii commented 4 years ago

Hey thanks for reporting and yes I noticed the same too. It looks like with some recent updates in storagenode itself the api calls got heavier on cpu and it takes longer to respond. I see the same with raw api calls to storagenode. Though I still need to look into it properly to confirm. Exporter logic is pretty simple and not much changed recently. Most recent update fixes metrics for suspended and disqualified status. Either way I'll need to take a closer look.

anclrii commented 4 years ago

On my node that's about 1.7T exporter takes about 6s to respond. Every time I call it I see storagenode cpu go up to ~100% for 6s while exporter process uses little cpu in process list (top):

# time curl -s http://127.0.0.1:9651/metrics >/dev/null

real    0m5.799s
user    0m0.019s
sys     0m0.031s

Exporter calls storagenode api once for each sat in a loop with a call like 127.0.0.1:14002/api/sno/satellite/<satid>. When I call it manually it takes about 1s to respond for each sat. Again storagenode spikes cpu to 100% for the duration of the call:

# time curl -s 127.0.0.1:14002/api/sno/satellite/12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB >/dev/null

real    0m0.980s
user    0m0.022s
sys     0m0.009s

My top output at the time of scrape shows 120% cpu used by storagenode and 2% exporter:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 6848 root      20   0  742584  41780  14572 S 120.0  1.0   1607:59 ./storagenode run --config-dir 
18228 root      20   0   25756  18816   3896 S   2.3  0.5  13:18.78 python ./storj-exporter.py

Something has changed in the way storagenode responds to api calls making it heavier on CPU. I bet if we benchmark the built-in dashboard it got heavier too.. unless it's pulling data directly from db.

Can you try to time the same curl calls on your side please so we can compare?

I thought the exporter only does something when prometheus scrapes it?

Yes, and I'm not sure why you see 5s spikes, would need more info to explain this.

kevinkk525 commented 4 years ago

My node has 6TB

# time curl -s http://127.0.0.1:9651/metrics >/dev/null

real    0m1.492s
user    0m0.005s
sys     0m0.004s

# time curl -s 127.0.0.1:14002/api/sno/satellite/12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB >/dev/null

real    0m0.247s
user    0m0.006s
sys     0m0.006s

My times are a lot lower than yours, probably because my zfs has it all cached in RAM.

So you don't see a spike every 5 seconds? Only every 30 seconds with each scrape?

kevinkk525 commented 4 years ago

The CPU spikes continue even if prometheus is down, so it doesn't have anything to do with the scraping. Without having looked at the code, there must be something polling the storagenode all the time.

kevinkk525 commented 4 years ago

I don't know what happened but after a server restart the spikes only appear every 30 seconds when it is being scraped.. Not sure why I had 5 seconds spikes before even though I restarted the exporter container and the prometheus container.

Edit: Well it didn't last long. The spikes are back every 5 seconds. Restarting the exporters doesn't change anything.

anclrii commented 4 years ago

I don't see 5 second spikes on the system. Check if you have multiple instances of prometheus container/process/vm polling same exporter. Also check with tcpdump (some examples https://github.com/wuseman/TCPDUMP) which ip is polling exporter and storagenode api and how often.

kevinkk525 commented 4 years ago

Thanks, I checked that. When I stop prometheus container, no prometheus processes are running. I checked with tcpdump which connections are still active on the storj-exporter container and even after restarting that container, every 5 seconds there is communication between the exporter and the storagenode container. It looks like it is initiated by a request from the host to the container on the prometheus port 9651, but I can't explain why or where it should come from. There is no prometheus process running and the host doesn't run anything, I have everything inside docker.

**12:49:04.000482 IP droidserver.fritz.box.40156 > 172.17.0.2.9651:** Flags [S], seq 2926605992, win 64240, options [mss 1460,sackOK,TS val 2276951173 ecr 0,nop,wscale 7], length 0
12:49:04.000505 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [S.], seq 1246677957, ack 2926605993, win 65160, options [mss 1460,sackOK,TS val 121417556 ecr 2276951173,nop,wscale 7], length 0
12:49:04.000516 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 1, win 502, options [nop,nop,TS val 2276951173 ecr 121417556], length 0
12:49:04.001463 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [P.], seq 1:152, ack 1, win 502, options [nop,nop,TS val 2276951174 ecr 121417556], length 151
12:49:04.001486 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [.], ack 152, win 508, options [nop,nop,TS val 121417557 ecr 2276951174], length 0
12:49:04.005873 IP 172.17.0.2.41572 > 172.17.0.9.14002: Flags [S], seq 1918455195, win 64240, options [mss 1460,sackOK,TS val 3159389919 ecr 0,nop,wscale 7], length 0
12:49:04.005952 IP 172.17.0.9.14002 > 172.17.0.2.41572: Flags [S.], seq 1536840778, ack 1918455196, win 65160, options [mss 1460,sackOK,TS val 2158069781 ecr 3159389919,nop,wscale 7], length 0
12:49:04.005968 IP 172.17.0.2.41572 > 172.17.0.9.14002: Flags [.], ack 1, win 502, options [nop,nop,TS val 3159389919 ecr 2158069781], length 0
12:49:04.006033 IP 172.17.0.2.41572 > 172.17.0.9.14002: Flags [P.], seq 1:157, ack 1, win 502, options [nop,nop,TS val 3159389919 ecr 2158069781], length 156
12:49:04.006075 IP 172.17.0.9.14002 > 172.17.0.2.41572: Flags [.], ack 157, win 508, options [nop,nop,TS val 2158069781 ecr 3159389919], length 0
12:49:04.007058 IP 172.17.0.9.14002 > 172.17.0.2.41572: Flags [P.], seq 1:1347, ack 157, win 508, options [nop,nop,TS val 2158069782 ecr 3159389919], length 1346
12:49:04.007080 IP 172.17.0.2.41572 > 172.17.0.9.14002: Flags [.], ack 1347, win 501, options [nop,nop,TS val 3159389920 ecr 2158069782], length 0
12:49:04.008188 IP 172.17.0.2.41572 > 172.17.0.9.14002: Flags [F.], seq 157, ack 1347, win 501, options [nop,nop,TS val 3159389921 ecr 2158069782], length 0
12:49:04.008307 IP 172.17.0.9.14002 > 172.17.0.2.41572: Flags [F.], seq 1347, ack 158, win 508, options [nop,nop,TS val 2158069783 ecr 3159389921], length 0
12:49:04.008325 IP 172.17.0.2.41572 > 172.17.0.9.14002: Flags [.], ack 1348, win 501, options [nop,nop,TS val 3159389921 ecr 2158069783], length 0
12:49:04.010586 IP 172.17.0.2.41580 > 172.17.0.9.14002: Flags [S], seq 2335370326, win 64240, options [mss 1460,sackOK,TS val 3159389923 ecr 0,nop,wscale 7], length 0
12:49:04.010642 IP 172.17.0.9.14002 > 172.17.0.2.41580: Flags [S.], seq 609193428, ack 2335370327, win 65160, options [mss 1460,sackOK,TS val 2158069785 ecr 3159389923,nop,wscale 7], length 0
12:49:04.010653 IP 172.17.0.2.41580 > 172.17.0.9.14002: Flags [.], ack 1, win 502, options [nop,nop,TS val 3159389923 ecr 2158069785], length 0
12:49:04.010706 IP 172.17.0.2.41580 > 172.17.0.9.14002: Flags [P.], seq 1:217, ack 1, win 502, options [nop,nop,TS val 3159389923 ecr 2158069785], length 216
12:49:04.010719 IP 172.17.0.9.14002 > 172.17.0.2.41580: Flags [.], ack 217, win 508, options [nop,nop,TS val 2158069785 ecr 3159389923], length 0
12:49:04.223102 IP 172.17.0.9.14002 > 172.17.0.2.41580: Flags [P.], seq 1:745, ack 217, win 508, options [nop,nop,TS val 2158069998 ecr 3159389923], length 744
12:49:04.223133 IP 172.17.0.2.41580 > 172.17.0.9.14002: Flags [.], ack 745, win 501, options [nop,nop,TS val 3159390136 ecr 2158069998], length 0
12:49:04.224157 IP 172.17.0.2.41580 > 172.17.0.9.14002: Flags [F.], seq 217, ack 745, win 501, options [nop,nop,TS val 3159390137 ecr 2158069998], length 0
12:49:04.224240 IP 172.17.0.9.14002 > 172.17.0.2.41580: Flags [F.], seq 745, ack 218, win 508, options [nop,nop,TS val 2158069999 ecr 3159390137], length 0
12:49:04.224251 IP 172.17.0.2.41580 > 172.17.0.9.14002: Flags [.], ack 746, win 501, options [nop,nop,TS val 3159390137 ecr 2158069999], length 0
12:49:04.227006 IP 172.17.0.2.41592 > 172.17.0.9.14002: Flags [S], seq 1323662292, win 64240, options [mss 1460,sackOK,TS val 3159390140 ecr 0,nop,wscale 7], length 0
12:49:04.227057 IP 172.17.0.9.14002 > 172.17.0.2.41592: Flags [S.], seq 1012405627, ack 1323662293, win 65160, options [mss 1460,sackOK,TS val 2158070002 ecr 3159390140,nop,wscale 7], length 0
12:49:04.227072 IP 172.17.0.2.41592 > 172.17.0.9.14002: Flags [.], ack 1, win 502, options [nop,nop,TS val 3159390140 ecr 2158070002], length 0
12:49:04.227123 IP 172.17.0.2.41592 > 172.17.0.9.14002: Flags [P.], seq 1:217, ack 1, win 502, options [nop,nop,TS val 3159390140 ecr 2158070002], length 216
12:49:04.227150 IP 172.17.0.9.14002 > 172.17.0.2.41592: Flags [.], ack 217, win 508, options [nop,nop,TS val 2158070002 ecr 3159390140], length 0
12:49:04.450357 IP 172.17.0.9.14002 > 172.17.0.2.41592: Flags [P.], seq 1:4097, ack 217, win 508, options [nop,nop,TS val 2158070225 ecr 3159390140], length 4096
12:49:04.450381 IP 172.17.0.2.41592 > 172.17.0.9.14002: Flags [.], ack 4097, win 491, options [nop,nop,TS val 3159390363 ecr 2158070225], length 0
12:49:04.450402 IP 172.17.0.9.14002 > 172.17.0.2.41592: Flags [P.], seq 4097:5741, ack 217, win 508, options [nop,nop,TS val 2158070225 ecr 3159390363], length 1644
12:49:04.450404 IP 172.17.0.2.41592 > 172.17.0.9.14002: Flags [.], ack 5741, win 501, options [nop,nop,TS val 3159390363 ecr 2158070225], length 0
12:49:04.451229 IP 172.17.0.2.41592 > 172.17.0.9.14002: Flags [F.], seq 217, ack 5741, win 501, options [nop,nop,TS val 3159390364 ecr 2158070225], length 0
12:49:04.451330 IP 172.17.0.9.14002 > 172.17.0.2.41592: Flags [F.], seq 5741, ack 218, win 508, options [nop,nop,TS val 2158070226 ecr 3159390364], length 0
12:49:04.451350 IP 172.17.0.2.41592 > 172.17.0.9.14002: Flags [.], ack 5742, win 501, options [nop,nop,TS val 3159390364 ecr 2158070226], length 0
12:49:04.453541 IP 172.17.0.2.41602 > 172.17.0.9.14002: Flags [S], seq 1898623425, win 64240, options [mss 1460,sackOK,TS val 3159390366 ecr 0,nop,wscale 7], length 0
12:49:04.453588 IP 172.17.0.9.14002 > 172.17.0.2.41602: Flags [S.], seq 1579252275, ack 1898623426, win 65160, options [mss 1460,sackOK,TS val 2158070228 ecr 3159390366,nop,wscale 7], length 0
12:49:04.453604 IP 172.17.0.2.41602 > 172.17.0.9.14002: Flags [.], ack 1, win 502, options [nop,nop,TS val 3159390366 ecr 2158070228], length 0
12:49:04.453651 IP 172.17.0.2.41602 > 172.17.0.9.14002: Flags [P.], seq 1:218, ack 1, win 502, options [nop,nop,TS val 3159390366 ecr 2158070228], length 217
12:49:04.453667 IP 172.17.0.9.14002 > 172.17.0.2.41602: Flags [.], ack 218, win 508, options [nop,nop,TS val 2158070228 ecr 3159390366], length 0
12:49:04.667711 IP 172.17.0.9.14002 > 172.17.0.2.41602: Flags [P.], seq 1:4097, ack 218, win 508, options [nop,nop,TS val 2158070442 ecr 3159390366], length 4096
12:49:04.667733 IP 172.17.0.2.41602 > 172.17.0.9.14002: Flags [.], ack 4097, win 491, options [nop,nop,TS val 3159390580 ecr 2158070442], length 0
12:49:04.667753 IP 172.17.0.9.14002 > 172.17.0.2.41602: Flags [P.], seq 4097:5806, ack 218, win 508, options [nop,nop,TS val 2158070442 ecr 3159390580], length 1709
12:49:04.667756 IP 172.17.0.2.41602 > 172.17.0.9.14002: Flags [.], ack 5806, win 501, options [nop,nop,TS val 3159390580 ecr 2158070442], length 0
12:49:04.668489 IP 172.17.0.2.41602 > 172.17.0.9.14002: Flags [F.], seq 218, ack 5806, win 501, options [nop,nop,TS val 3159390581 ecr 2158070442], length 0
12:49:04.668555 IP 172.17.0.9.14002 > 172.17.0.2.41602: Flags [F.], seq 5806, ack 219, win 508, options [nop,nop,TS val 2158070443 ecr 3159390581], length 0
12:49:04.668563 IP 172.17.0.2.41602 > 172.17.0.9.14002: Flags [.], ack 5807, win 501, options [nop,nop,TS val 3159390581 ecr 2158070443], length 0
12:49:04.671618 IP 172.17.0.2.41608 > 172.17.0.9.14002: Flags [S], seq 2082227832, win 64240, options [mss 1460,sackOK,TS val 3159390584 ecr 0,nop,wscale 7], length 0
12:49:04.671681 IP 172.17.0.9.14002 > 172.17.0.2.41608: Flags [S.], seq 3098501057, ack 2082227833, win 65160, options [mss 1460,sackOK,TS val 2158070446 ecr 3159390584,nop,wscale 7], length 0
12:49:04.671709 IP 172.17.0.2.41608 > 172.17.0.9.14002: Flags [.], ack 1, win 502, options [nop,nop,TS val 3159390584 ecr 2158070446], length 0
12:49:04.671842 IP 172.17.0.2.41608 > 172.17.0.9.14002: Flags [P.], seq 1:218, ack 1, win 502, options [nop,nop,TS val 3159390585 ecr 2158070446], length 217
12:49:04.671873 IP 172.17.0.9.14002 > 172.17.0.2.41608: Flags [.], ack 218, win 508, options [nop,nop,TS val 2158070447 ecr 3159390585], length 0
12:49:04.883893 IP 172.17.0.9.14002 > 172.17.0.2.41608: Flags [P.], seq 1:4097, ack 218, win 508, options [nop,nop,TS val 2158070659 ecr 3159390585], length 4096
12:49:04.883915 IP 172.17.0.2.41608 > 172.17.0.9.14002: Flags [.], ack 4097, win 491, options [nop,nop,TS val 3159390797 ecr 2158070659], length 0
12:49:04.883935 IP 172.17.0.9.14002 > 172.17.0.2.41608: Flags [P.], seq 4097:5795, ack 218, win 508, options [nop,nop,TS val 2158070659 ecr 3159390797], length 1698
12:49:04.883938 IP 172.17.0.2.41608 > 172.17.0.9.14002: Flags [.], ack 5795, win 501, options [nop,nop,TS val 3159390797 ecr 2158070659], length 0
12:49:04.884812 IP 172.17.0.2.41608 > 172.17.0.9.14002: Flags [F.], seq 218, ack 5795, win 501, options [nop,nop,TS val 3159390798 ecr 2158070659], length 0
12:49:04.884895 IP 172.17.0.9.14002 > 172.17.0.2.41608: Flags [F.], seq 5795, ack 219, win 508, options [nop,nop,TS val 2158070660 ecr 3159390798], length 0
12:49:04.884911 IP 172.17.0.2.41608 > 172.17.0.9.14002: Flags [.], ack 5796, win 501, options [nop,nop,TS val 3159390798 ecr 2158070660], length 0
12:49:04.886666 IP 172.17.0.2.41610 > 172.17.0.9.14002: Flags [S], seq 1714591972, win 64240, options [mss 1460,sackOK,TS val 3159390799 ecr 0,nop,wscale 7], length 0
12:49:04.886704 IP 172.17.0.9.14002 > 172.17.0.2.41610: Flags [S.], seq 643961372, ack 1714591973, win 65160, options [mss 1460,sackOK,TS val 2158070661 ecr 3159390799,nop,wscale 7], length 0
12:49:04.886713 IP 172.17.0.2.41610 > 172.17.0.9.14002: Flags [.], ack 1, win 502, options [nop,nop,TS val 3159390799 ecr 2158070661], length 0
12:49:04.886743 IP 172.17.0.2.41610 > 172.17.0.9.14002: Flags [P.], seq 1:218, ack 1, win 502, options [nop,nop,TS val 3159390799 ecr 2158070661], length 217
12:49:04.886752 IP 172.17.0.9.14002 > 172.17.0.2.41610: Flags [.], ack 218, win 508, options [nop,nop,TS val 2158070661 ecr 3159390799], length 0
12:49:05.099395 IP 172.17.0.9.14002 > 172.17.0.2.41610: Flags [P.], seq 1:4097, ack 218, win 508, options [nop,nop,TS val 2158070874 ecr 3159390799], length 4096
12:49:05.099416 IP 172.17.0.2.41610 > 172.17.0.9.14002: Flags [.], ack 4097, win 491, options [nop,nop,TS val 3159391012 ecr 2158070874], length 0
12:49:05.099435 IP 172.17.0.9.14002 > 172.17.0.2.41610: Flags [P.], seq 4097:5760, ack 218, win 508, options [nop,nop,TS val 2158070874 ecr 3159391012], length 1663
12:49:05.099437 IP 172.17.0.2.41610 > 172.17.0.9.14002: Flags [.], ack 5760, win 501, options [nop,nop,TS val 3159391012 ecr 2158070874], length 0
12:49:05.100596 IP 172.17.0.2.41610 > 172.17.0.9.14002: Flags [F.], seq 218, ack 5760, win 501, options [nop,nop,TS val 3159391013 ecr 2158070874], length 0
12:49:05.100692 IP 172.17.0.9.14002 > 172.17.0.2.41610: Flags [F.], seq 5760, ack 219, win 508, options [nop,nop,TS val 2158070875 ecr 3159391013], length 0
12:49:05.100711 IP 172.17.0.2.41610 > 172.17.0.9.14002: Flags [.], ack 5761, win 501, options [nop,nop,TS val 3159391013 ecr 2158070875], length 0
12:49:05.104689 IP 172.17.0.2.41612 > 172.17.0.9.14002: Flags [S], seq 3785177347, win 64240, options [mss 1460,sackOK,TS val 3159391017 ecr 0,nop,wscale 7], length 0
12:49:05.104748 IP 172.17.0.9.14002 > 172.17.0.2.41612: Flags [S.], seq 1257321149, ack 3785177348, win 65160, options [mss 1460,sackOK,TS val 2158070879 ecr 3159391017,nop,wscale 7], length 0
12:49:05.104768 IP 172.17.0.2.41612 > 172.17.0.9.14002: Flags [.], ack 1, win 502, options [nop,nop,TS val 3159391017 ecr 2158070879], length 0
12:49:05.104833 IP 172.17.0.2.41612 > 172.17.0.9.14002: Flags [P.], seq 1:218, ack 1, win 502, options [nop,nop,TS val 3159391018 ecr 2158070879], length 217
12:49:05.104860 IP 172.17.0.9.14002 > 172.17.0.2.41612: Flags [.], ack 218, win 508, options [nop,nop,TS val 2158070880 ecr 3159391018], length 0
12:49:05.353226 IP 172.17.0.9.14002 > 172.17.0.2.41612: Flags [P.], seq 1:4097, ack 218, win 508, options [nop,nop,TS val 2158071128 ecr 3159391018], length 4096
12:49:05.353252 IP 172.17.0.2.41612 > 172.17.0.9.14002: Flags [.], ack 4097, win 491, options [nop,nop,TS val 3159391266 ecr 2158071128], length 0
12:49:05.353959 IP 172.17.0.9.14002 > 172.17.0.2.41612: Flags [P.], seq 4097:5681, ack 218, win 508, options [nop,nop,TS val 2158071129 ecr 3159391266], length 1584
12:49:05.353980 IP 172.17.0.2.41612 > 172.17.0.9.14002: Flags [.], ack 5681, win 501, options [nop,nop,TS val 3159391267 ecr 2158071129], length 0
12:49:05.354106 IP 172.17.0.2.41612 > 172.17.0.9.14002: Flags [F.], seq 218, ack 5681, win 501, options [nop,nop,TS val 3159391267 ecr 2158071129], length 0
12:49:05.354246 IP 172.17.0.9.14002 > 172.17.0.2.41612: Flags [F.], seq 5681, ack 219, win 508, options [nop,nop,TS val 2158071129 ecr 3159391267], length 0
12:49:05.354265 IP 172.17.0.2.41612 > 172.17.0.9.14002: Flags [.], ack 5682, win 501, options [nop,nop,TS val 3159391267 ecr 2158071129], length 0
12:49:05.356507 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 1:18, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276951174], length 17
12:49:05.356536 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 18, win 502, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356576 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 18:55, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 37
12:49:05.356585 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 55, win 502, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356602 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 55:93, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 38
12:49:05.356607 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 93, win 502, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356626 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 93:174, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 81
12:49:05.356631 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 174, win 502, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356651 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 174:7414, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 7240
12:49:05.356659 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 7414, win 479, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356662 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 7414:14654, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 7240
12:49:05.356668 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 14654:24790, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 10136
12:49:05.356672 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 14654, win 446, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356681 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [P.], seq 24790:30665, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 5875
12:49:05.356700 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 24790, win 367, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356713 IP droidserver.fritz.box.40156 > 172.17.0.2.9651: Flags [.], ack 30665, win 322, options [nop,nop,TS val 2276952529 ecr 121418912], length 0
12:49:05.356734 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [F.], seq 30665, ack 152, win 508, options [nop,nop,TS val 121418912 ecr 2276952529], length 0
**12:49:05.356889 IP droidserver.fritz.box.40156** > 172.17.0.2.9651: Flags [F.], seq 152, ack 30666, win 501, options [nop,nop,TS val 2276952530 ecr 121418912], length 0
12:49:05.356902 IP 172.17.0.2.9651 > droidserver.fritz.box.40156: Flags [.], ack 153, win 508, options [nop,nop,TS val 121418913 ecr 2276952530], length 0
anclrii commented 4 years ago

Maybe try to restart exporter with a wrong port (-p 9999:9651) to block any connections targeting 9651 just to test the assumption that something else is polling exporter other then the known prometheus instance. This should stop 5s spikes if there was something else.

Any chance you have some other monitoring/health-checks that monitor exporter itself and connect on the port to check if it's alive?

anclrii commented 4 years ago

I guess it would be good to add an endpoint for such healthchecks that wouldn't trigger exporter to collect metrics as currently I think any connections to exporter port will trigger it. Will add to issues.

kevinkk525 commented 4 years ago

Thank you , the monitoring was the right clue. Stopping netdata stopped the requests to the exporter. So it is netdata that is polling the port for some reason and the exporter polls the node every time. So you're correct in your assumption that any connection will trigger the data collection.

anclrii commented 4 years ago

Huh this was a long shot, glad it helped