cloudflare / goflow

The high-scalability sFlow/NetFlow/IPFIX collector used internally at Cloudflare.
BSD 3-Clause "New" or "Revised" License
881 stars 176 forks source link

Cardinality explosion in flow_traffic metrics #94

Open slrtbtfs opened 3 years ago

slrtbtfs commented 3 years ago

Goflows Prometheus metrics flow_traffic_(bytes|packets|summary_size_bytes) include labels for remote_ip and remote_port.

Some services that send flows (e.g. pmacctd) use a new outgoing port each time they send flows to goflow, which results in the creation of several new time series. For a single pmacctd client with almost no network traffic, this creates thousands of time series each day, which, as profiling reveals, consume several hundred MBs of heap memory.

This leads goflow to eventually run out of memory.

One solution to this would be omitting the remote_port label from these metrics.

SuperQ commented 3 years ago

Randomizing on every flow seems a bit broken, IMO. But I agree, the remote_port, which should be an ephemeral port, doesn't seem like a good identifier.