cloudflare / goflow

The high-scalability sFlow/NetFlow/IPFIX collector used internally at Cloudflare.
BSD 3-Clause "New" or "Revised" License
859 stars 172 forks source link

Changed RouterAddr on UDP Load Balancer #39

Closed MIKNOTAURO closed 4 years ago

MIKNOTAURO commented 5 years ago

Hi guys, I'm collecting data from one router with goflow and everything works well, now I'm trying to collect data from 500 Routers. My first question is: Due to the network throughput, should I put a udp load balancer in front of goflow or can goflow handle this network traffic?... And my second question:

               +------------+
               |            |
               |            |
               |            |  NGINX UDP Load Balancer
               |            |
               |            |
               +------+-----+
                      |
                      |
                      |
      +---------------+--------------+
      |                              |
+-----+-----+                 +------+------+
|           |                 |             |
|           |                 |             |
|           |  Collector 1    |             |  Collector N
|           |                 |             |
+-----------+                 +-------------+

I'm trying to achieve this architecture and at first sight it works, but the problem with this arch is that I'm getting the Load Balancer IP as a RouterAddr because it seems that goflow takes UDP source datagram as a RouterAddr. Now I'm trying to pass the original client Ip to goflow instances whit this: (important parts here) user root; stream { upstream collectors { server x.x.x.x:xxxx; server x.x.x.x:xxxx; } server { listen 5000 udp; proxy_bind $remote_addr transparent; proxy_pass collectors; } }

but did not work... even if this (UDP Load Balancer) could work, I'm not sure that I can get the RouterAddr from goflow :/

Any advice/help or another approach on this would be appreciated?

I'm using goflow with docker and this image id -> 4f84dcd62e08

debugloop commented 4 years ago

Hi, I've some hints on my usage, no concrete answers, sorry :)

So far, goflow has been handling any amount of flows I've directed at it with a single instance, which is not saying much since thats been 1:32 sampled NFv9 of a regional research and education provider with a dozen ASR9k routers. Still, I am using https://github.com/sleinen/samplicator (developed at another research network) which is a kind of UDP multiplexer that supports source address spoofing for the datagrams. You'll need to set cap_net_raw+eip on it's binary to allow and maybe increase the host's receive buffer sysctl net.core.rmem_max. This would however not help you if you'll try to do some kind of round robin multiplexing on a single router's Netflow, as the samplicator only supports directing spoofed UDP by it's source address. I'm using it with a single goflow instance anyways, the multiplexing is for other Netflow collectors.

If you really have just the one router and flows from a single source address, goflow will probably be fine. If you're still concerned about goflows performance, this router most likely has different flow exporting interfaces, or even just different exporter maps on a single interface. These will use (speaking for Cisco here, but it'll be similar for other vendors) distinct source ports while having the same source address, making some kind of round robin routing thingie possible using iptables. I've never done that before, just saying you should be able to use the source port and save the nginx (which will need to spoof it's datagrams, dunno how to do that).

Edit: Just for completeness' sake: You could of course configure the routers to use different targets for the flows, but I figure you've opened the issue because that's not possible in your setting?

lspgn commented 4 years ago

Responding a bit late:

You may put a load-balancer but since NetFlow (v9)/IPFIX are stateful and using the source IP in the UDP packet. Ideally for that case, you want the samples from a same router arriving on the same collector. This leads to hashed load-balancing which may not always be perfectly equal.

What I'd suggest would be: benchmarking the decoding times GoFlow is taking using the metric flow_summary_decoding_time_us (Prometheus endpoint is /metrics). The decoding can be parallelized on multiple cores using a thread pool (you can pass the -workers N CLI argument to use N threads). Let's say the decoding of one sample is 30µS this mean you can decode around 33k samples per second with one worker. If you have 8 cores, configured 8 workers, that's around 250k samples per second.

I have not tried @debugloop's suggestions but you can tune your OS receive buffers to handle sudden increase in flows without dropping them or dedicate more CPU to those tasks.

Potential solutions or suggestions:

Let me know if that helps. The solutions listed above may not necessarily apply to your case @MIKNOTAURO but we will require more information/limitations to answer more accurately.

debugloop commented 4 years ago

I actually have a service address configured on my goflow instance, but I've been reluctant to add a second instance. Will goflow keep flow records for which no template was received yet long enough? Or will some flows drop in the event of a change of routing from a router to another goflow instance?

Still, I'll spin up some more goflow instances at some point for redundancy, if not with equal cost then with different metrics as hot standby. ECMP will be hard in our case anyways, as we're BGP-only.

I talked about this with a colleague from the IP team just now, and he's had this idea of running goflow on the router, as our chassis apparently support spawning containers. He wasn't being serious tho...

Also I'd like to contest the "snowflakes" argument :smile: The router configurations will all be the same, pointing their exporter map to either a samplicator service which multiplexes/spoofs UDP or a host running iptables. The goflow configurations may also be equal on any hosts receiving Netflow from either multiplexing variant. Granted, iptables will be kind of fiddly, and the samplicator is my first choice only because we require the raw Netflow for other collectors.

lspgn commented 4 years ago

Interesting use-case. What do you mean by BGP only?

At the moment, it's just dropping the samples as it generates an error (template not found). I would be possible with some extensive modifications (send the flow to be resampled later on, or processing to be triggered on a template received).

I thought about that in the past to avoid the 10 minutes of cold start. There's an http endpoint for /templates to access the definition of samples. I thought about some kind of synchronization as well from a static JSON dump of the templates but never had enough incentive to code it.

I spoke too fast by saying "snowflake configurations", also taking my use-case where we try to have the configurations that are the same everywhere. Nothing that's impossible to manage with good automation :) .

MIKNOTAURO commented 4 years ago

Thanks for your time and answers guys.

For now is not exactly a problem. It works just fine (several routers send flows to one collector, decode and save to kafka cluster) but thinking in high availability, I believe we need to have a UDP Load Balancer infront of collectors or something like this for failover.

My problem starts here. We put a UDP LB (nginx) infront of goflow instances (2 right now) as I mentioned above, but when we parse the flows from kafka, what we get as a "SamplerAddress" is the load balancer ip.

My original problem. We need to aggregate client's data to each flow in order to provide a network traffic report per client. So, we have a registry of local ip's but since we have many routers, local ip is not enough to match the flow data (SrcAddr) with a client so, we need to know from wich router (SamplerAddress) the flow comes (we also have a registry of routers).

I can not use sFlow because our routers (Mikrotik) do not have this option to export flows (just IPFIX and Netflow 5/9).

I also want to say that our application is based on python so, we build pb/flow.pro for this language and when we parse the flow some fields do not exist as README file says... for example SampleAddress is RouterAddr (I think)

Now I'm trying with this version goflow-v3.2.0-linux-x86_64

I didn't know about ECMP (but finding out) Meanwhile if you have any advice/recommendations would be appreciated

MIKNOTAURO commented 4 years ago

This is what we are trying to achieve

Diagrama3

lspgn commented 4 years ago

That is unfortunately a problem with NetFlow/IPIFIX due to the agent address being the source IP of the packet. In order to do load-balancing, IP-level will work (ECMP), or techniques that allow to keep the source IP. Quickly searched and found an article on nginx: https://www.nginx.com/blog/ip-transparency-direct-server-return-nginx-plus-transparent-proxy/

SamplerAddress is the source address of the NetFlow/IPFIX packet and Agent IP in an sFlow packet. From v2 to v3, some field names changed.

debugloop commented 4 years ago

What do you mean by BGP only?

We do not run OSPF in our DC, and as I understand it, having equal cost with BGP is kind of involved.

@MIKNOTAURO funny how originally, traffic accounting was also the primary use case for the project I'm working on. I think your options are:

  1. UDP multiplexing: either using nginx with IP transparency (direct server return does not matter, Netflow is not a conversation) or samplicator with spoofing
  2. IP routing: using ECMP or some more basic form of service IP routing (for instance tie break by hop distance between equal announcements)

As for the IP adress matching, I am using this module https://github.com/bwNetFlow/ip_prefix_trie within an enrichment tool which takes the flow ingress topic in Kafka and adds some more data while copying to a more advanced topic. I won't recommend you to use my stuff just yet, but the algortihm for fast IP matching might be useful to you. I also have some version in Python I think, if that would help you more, but I'll have to open source that first. A colleague ported my code to Python some time ago as the ipaddress module is very slow too.

lspgn commented 4 years ago

@debugloop that's a cool module! I will take a look. Thanks for sharing. Internally, we also use the Kentik Patricia/prefix-trie library: https://github.com/kentik/patricia for everything that's mapping IP ranges to ASN, countries, plans... But for routers, a simple hash map is good. Worth to point, Clickhouse can do this through dictionaries.

debugloop commented 4 years ago

Interesting, I've hadn't had kentik/patricia on my radar yet. Too bad it hasn't been around when I got started with my own trie, but I'll look into it now I guess. Main differences would be that I've not considered GC at all, and that I've not tried to make the trie skip sparse nodes.

We're actually using it to tag flows with customer IDs only, which is about the same trick as tagging ASN except our customers are largely not an AS in their own right. For countries we're using some Maxmind dataset and the default API it provides, which is working ok.

I've been wanting to look into Clickhouse too since I've read about it in goflow's Readme.

MIKNOTAURO commented 4 years ago

@debugloop That would be really appreciated! and sorry if I'm asking silly questions, I'm new in this field, but actually, my pipe line works ( Routers + Goflow + Kafka + Faust/python + InfluxDB/elasticsearch).

I have another questions about these types of architectures, but since is not related to goflow, if you can help me, I think it would be better to use something like slack, what do you think guys? @lspgn

debugloop commented 4 years ago

@MIKNOTAURO I had to butcher some coworkers larger script, but here it is. It's quite simple really, but much more efficient than iterating over all your subnets as ipaddress objects and checking membership against them.

I don't have slack, but you can mail me using my GH profile;s email or find me on IRC, same nick as here, for instance on freenode.