cloudflare / goflow

The high-scalability sFlow/NetFlow/IPFIX collector used internally at Cloudflare.
BSD 3-Clause "New" or "Revised" License
852 stars 171 forks source link

Memory leak #85

Closed swb-ops closed 3 years ago

swb-ops commented 4 years ago

Hello! I tried to use goflow to collect sflow statistics and write it to logstash (ELK stack) through kafka broker.

The pipeline is goflow -> kafka -> logstash-input-kafka (protobuf codec) -> elastic

I get accurate statistics, but see a fast memory leak of goflow collector. My VM have 32 GB of RAM, which is enough for about 2 hours. The memory is freed after goflow service restart.

Does anyone know what could be the reason ?

lspgn commented 4 years ago

Hi @swb-ops could you indicate the version you are using? Is there slowness on the Kafka side? Does the memory leak happens when -kafka=false?

swb-ops commented 4 years ago

Hi @lspgn, thanks for the answer. I use GoFlow v3.4.2.

Yes, the memory leak happens when -kafka=false.

# systemctl status goflow
● goflow.service - GoFlow
   Loaded: loaded (/lib/systemd/system/goflow.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-07-21 20:41:30 MSK; 14min ago
 Main PID: 30059 (goflow)
    Tasks: 14 (limit: 4915)
   CGroup: /system.slice/goflow.service
           └─30059 /usr/bin/goflow -sflow.addr 192.168.226.22 -sflow.port 9953 -kafka=false

When I started the service:

# free -m
              total        used        free      shared  buff/cache   available
Mem:          32167        1441       26275          32        4450       30136

After 15 minutes of work:

# free -m
              total        used        free      shared  buff/cache   available
Mem:          32167        4158       22761          32        5247       27409
lspgn commented 4 years ago

How many flows per second are you processing?

swb-ops commented 4 years ago

We are currently processing about 3000 flows per second.

swb-ops commented 4 years ago

I tried using versions 3.4.2, 3.4.0 and 3.1.0, sending netflow and sflow to the collector, migrating goflow to another VM. Memory leaks occurred in all cases.

swb-ops commented 4 years ago

Hi @lspgn, Maybe you have any other ideas what could be the reason for the memory leak ?

lspgn commented 4 years ago

Are you only sending sFlow to this port? Could you make a packet capture of the UDP traffic?

swb-ops commented 4 years ago

dump_sflow.zip Yes, we are sending only sFlow to this port. We are currently processing sflow from several devices, it is about 300 flows per second and 32 GB of RAM enough for about 12 hours. Packet capture in an attachment.

lspgn commented 4 years ago

I checked the file @swb-ops: I will try to replay it later this week. Wireshark alert on Frame 7 (Gryphon protocol) but it is unlikely to be the issue. Could be the IPv4 data section along with the raw packet header. Out of curiosity what is the sFlow agent?

swb-ops commented 4 years ago

Hello, We are getting the sflow from Juniper and Huawei.

lspgn commented 3 years ago

Sorry for the delay, following up again: what's the version of the Juniper you're running? Trying to correlate with another issue.

lspgn commented 3 years ago

Also tried to reproduce by replaying your file in a loop, but I am not seeing an increase of RAM usage :( . If the issue is still ongoing, would you be able to use custom version of GoFlow with pprof and collect performance data from there?

swb-ops commented 3 years ago

Hi, thanks for the answer. We use 18.1R3-S6.1 and 18.4R2-S3 versions of the Juniper. Yes, we are ready to try custom version of GoFlow.

swb-ops commented 3 years ago

Hello @lspgn,, We can hope that you will provide us custom version of GoFlow with pprof ?

lspgn commented 3 years ago

Hello, Yes, I will craft a custom version with pprof but I have not had time yet. My apologies.

Thank you

swb-ops commented 3 years ago

Ok, we will wait. Thank you.

swb-ops commented 3 years ago

Hello @lspgn We are using Nginx as a load balancer and it changed src ports. I configured Nginx transparently and now goflow works very well. Thank you.