cloudflare / goflow

The high-scalability sFlow/NetFlow/IPFIX collector used internally at Cloudflare.
BSD 3-Clause "New" or "Revised" License
852 stars 171 forks source link

Performance impact question #80

Closed jcdaniel14 closed 4 years ago

jcdaniel14 commented 4 years ago

Hi there folks, here's the picture, I'm receiving flows from different routers, for some reason I receive a "copy" of the flows sent to another collector (I'm just another exporter on IOS-XR) so I get data from ports that I don't really need and I can't change the network configuration.

In order to save disk space I decided to hardcode these ports into goflow (specifically SendKafkaFlowMessage func) so when the flow comes from port "X" it does not send it to Kafka, it just ignores the message.

And it does the job but I'm not quite sure if I may still be overloading the server, in networking when you discard a packet before processing it this does not represent a performance issue for the router, would this be the case too? perhaps the impact is meaningless and I should not worry at all?

My netflow traffic is considerable and continuously growing so I can't tell if the server is overloading due to traffic peaks or because I introduced a little bit more processing overhead.

I would appreciate some insight into this. Thank you.

lspgn commented 4 years ago

Hi @jcdaniel14 What metrics are you looking at that make you say your server is overloaded? Most of the processing of GoFlow is done when decoding Filtering on the interface should be negligible.

jcdaniel14 commented 4 years ago

Hi, I have a bare metal server, 128GB RAM 24 cores AMD Opteron 6174, CPU processing seems to be around 50% utilization with spikes up to 70%. I also have in this server kafka and ELK Stack so it is difficult to tell if I'm putting pressure by adding these lines of code inside goflow methods.

lspgn commented 4 years ago

Do you have per-process monitoring? Something like prometheus-node-exporter and process-node-exporter/cadvisor? For just goflow, this should be able to handle thousands of flows per second.

If you want to dive on the effect of adding this function, I would suggest using pprof.

mirsblog commented 4 years ago

@lspgn FWIW, I ran some tests on a VM with 4vCPU 2.3GHz and 32GB RAM and compared it with nfacct. The maximum rate at which I could decode IPFIX and publish to kafka, without dropped packets, was at 15000 packets/second. Anything more than that caused a significant packet loss. nfacct in comparison could easily scale up to 60000 packets/second.

I ran the tests with -kafka=false and it seemed to not have any effect. I also increased the number of workers but I did not find any marked difference in packet drops with 1 or 100 workers.

Test setup: Host type: VM CPU: 4 vCPU Intel Xeon E312xx Memory: 32GB

I used IPFIX PCAP with tcpreplay to send packets from one host to another. $ sudo tcpreplay -i ens3 -K --loop=50000 -p 15000 ipfix.pcap

I monitored packet drops @ /proc/net/udp6.

lspgn commented 4 years ago

@mirsblog interesting, thanks for the insights. Does it use all the processors?

mirsblog commented 4 years ago

@mirsblog interesting, thanks for the insights. Does it use all the processors?

I assume so given runtime.GOMAXPROCS(runtime.NumCPU()) is set in goflow.go. Would that be a correct assumption?

lspgn commented 4 years ago

Yes it should use all processors, was just curious if the load distribution would be the same when looking at htop. Did you compile GoFlow or did you get a specific binary?

mirsblog commented 4 years ago

Yes it should use all processors, was just curious if the load distribution would be the same when looking at htop. Did you compile GoFlow or did you get a specific binary?

GoFlow: v3.4.2 GoLang: 1.14 Built Alpine image using Dockerfile found in the v3.4.2. Ran using the instructions from README.

Edit: Tested just now and checked in htop to confirm CPU load distribution is even with workers=4

lspgn commented 4 years ago

The first thing I can think of affecting NetFlow decoding performance would be the shared template cache. Protobuf encoding (+memory allocations) may also be the reason. Would need to compile a specific version which bypasses this. Also would need to pprof.

mirsblog commented 4 years ago

Ok. I will consider that and start another thread when I have more to share.

jcdaniel14 commented 4 years ago

Do you have per-process monitoring? Something like prometheus-node-exporter and process-node-exporter/cadvisor? For just goflow, this should be able to handle thousands of flows per second.

If you want to dive on the effect of adding this function, I would suggest using pprof.

Don't really have per-process monitoring but will dive into it, the server is processing 25k flows/sec atm according to logstash and hasn't been affected noticeable by the changes I made. I was just worried that it handles the processing of flows/kafka/elasticsearch/logstash at the same time and I could put some stress by adapting the code the way I did, thanks for clarifying and giving some good practices advice.