What is the Maximum Ingestion rate that is acceptable by IPFIXCol2?

SankarSadasivam commented 5 months ago

HI Team, We are planning to use IPFIXCol2 as a collector for our NetFlow collection as a replacement to our existing vendor current tool. The current maximum ingestion that the vendor tool can accept is 8 million Flows/Min. I don't see any note about the maximum ingestion rate that IPFIXCol2 can process per minute? Can you please help with standards related to load it can accept please?

Kind Regards, Sanky.

Lukas955 commented 5 months ago

Hello,

there's no easy answer to this question. A lot depends on the actual structure of the data sent by your probes and what you plan to do with the flow records (e.g. save them in a binary file or forward them in JSON format to other systems)

E.g. typical use-cases: (a) If you want to convert records to JSON format and forward them to another server, you can expect a throughput of around 200-400k flow records/s. b) If you want to store the data in binary form (e.g. FDS file), you can expect a throughput of around 1 million flow record/s.

However, the numbers also depend on the performance of your hardware and a number of other factors.

Lukas

Lukas955 commented 5 months ago

Just note that my values are in units per second (not per minute). In other words, I would say that our collector is considerably faster.

Lukas

SankarSadasivam commented 5 months ago

Thanks Lukas for your response. Most of the Flow Records sent by our sources are based on Template ID 258 and we want to collect, enrich and push it onto a Datastore in JSON format, so to answer your question, it will be a JSON forward. Even if I consider 400K records per second, still it will be 2.4 Million per Min, which will not reach the level we have, which is 8 Million Per min.

Lukas955 commented 5 months ago

I think you've miscalculated a bit. 400k x 60s = 24 milion flows/min.

As for template ID 258, unfortunately the template specification is dynamic and always dependent on your specific probes. In other words, it is unfortunately not possible to deduce anything from the number 258.

Lukas

SankarSadasivam commented 5 months ago

Ah, I'm worse in Maths :) yep 24 millions. Thanks for correcting. I don't think Template spec is dynamic and this site will give you a view on what Template 258 in general should behave like.

https://docs.fortinet.com/document/fortigate/7.2.8/administration-guide/448589/netflow-templates#258

Lukas955 commented 5 months ago

The structure of the referenced Template is relatively simple and the required performance targets should be achievable.

By the way, the definition of templates always depends on the specific implementation of the probe. For example, I usually work with probes that can send completely different flow fields under the same Template ID after a restart. In fact, the NetFlow/IFPIX protocol does not say that the items must always be the same under one Template ID. That's why I mentioned that templates are dynamic.

However, if I understand correctly, in your case you always have as flow data source multiple devices from the same manufacturer, which guarantees that under the same Template ID there are always the same fields. Ok, why not.

Lukas

SankarSadasivam commented 5 months ago

Thanks Lukas for the explanation. We don't use probes to export flows. The routers themselves exports the flows with a standard configuration across. As you said, Netflow v9/IPFIX doesn't say it should be on the same template and I understand it is dynamic, but it has a standard set of attributes as per their RFC which I believe is supported by this Collector. https://www.ietf.org/rfc/rfc3954.txt

We are predominant Cisco based, but there are other vendors too. The cases you mentioned are also there where I have seen the flows doesn't correspond to the template which are not getting parsed too.

Sanky

Lukas955 commented 5 months ago

All fields mentioned in the Template should be supported. The exception is the FLOW_FLAGS field (65), which is not clearly defined in NetFlow/IPFIX standard. However, the JSON plugin will also convert, it if the option to skip unknown fields is not active. The name associated with will probably be something like en4294967294id65 or similar.

Let me know if you need help with anything.

Lukas

SankarSadasivam commented 5 months ago

Many Thanks Lukas. Let me try it out. We were trying some other Open source NetFlow Collector in parallel and we got the below issue for some of the sources. 2024-04-08 15:59:07 +0000 [warn]: #0 No matching template for host="x.x.x.x" source_id=256 flowset_id=256.

Sanky

Lukas955 commented 5 months ago

I guess you have run into a typical NetFlow/IPFIX protocol problem.

Namely, if you use the UDP transport protocol for the transfer, the collector may not be able to parse the incoming records for some time after it starts. The probes (in your case, routers) send the template definitions needed to interpret the flow records periodically in the data (look for something like "template refresh/timeout" in the router configuration). Since the collector usually started later than probe/router, it missed receiving the definitions and has to wait until they are resent. Depending on the configuration of the probe/router, this can take a few seconds or even a few minutes, during which the corresponding uninterpretable records are ignored.

Lukas

SankarSadasivam commented 5 months ago

Thanks Lukas. Read a similar pattern somewhere. Thanks for the inputs. Will try it out and revert if I face any issues.

SankarSadasivam commented 5 months ago

1 another point missed to ask you Lukas. The JSON output plugin, is there a way for us to push the data onto standard document stores like Elastic/Opensearch Cluster similar to using additional Properties for Kafka push.

Lukas955 commented 5 months ago

Hi,

I don't have first hand experience with your targeted storage, but I think it should be possible to use send or possibly server output. These JSON outputs send converted flow records via TCP or UDP (one record per line). It should probably be enough for the datastore to have a general JSON reception via TCP/UDP.

However, I believe it should be possible to eventually use Kafka as a message broker to transfer (and possibly enrich) records.

Lukas

sedmicha commented 5 months ago

I'm not very familiar with these data stores either, but based on my quick search it should be possible to use the JSON output in "send" mode and Logstash with TCP input plugin and OpenSearch/ElasticSearch output plugin.

SankarSadasivam commented 5 months ago

Thanks Lukas, Sedmicha. The options you guys suggested are the backup

To push it via a tcp/udp connection and then use Logstash or FluentD collector/aggregator to push it onto our own Datastore.
Push it onto a Kafka Topic and again use plugins to push data onto our own datastores.

The reason I raised that question is, do we have additional properties support for "Send"mode or in future to have output plugins to push data onto Document or Timeseries datastores as flow data corresponds to a specific time snapshot.

SankarSadasivam commented 5 months ago

Hi, Similar to "No matching template issue", there is 1 more issue on flow sequence numbers. I understand this is because of udp again on the way it works, but is it right to assume or is there any way to solve it?

WARNING: UDP collector (parser): [10.37.208.64:50452, ODID: 256] Unexpected Sequence number (expected: 278756670, got: 278756667)

Lukas955 commented 5 months ago

Hi,

yes, this is also typical a "feature" of UDP transport. The only solution is to switch to TCP, which I understand is probably not possible due to lack of support on routers.

In the example, you provided, it looks like simple reorder of UDP packets ("expected" sequence number is greater than sequence number of the "received/got" packet). However, this message (if "expected" < "got") may also indicate that other packets may be lost somewhere during transmission.

Lukas

SankarSadasivam commented 5 months ago

Thanks Lukas. Thought the same but thought of reverifying it once with the experts.

sedmicha commented 5 months ago

Thanks Lukas, Sedmicha. The options you guys suggested are the backup
1. To push it via a tcp/udp connection and then use Logstash or FluentD collector/aggregator to push it onto our own Datastore.

2. Push it onto a Kafka Topic and again use plugins to push data onto our own datastores.
The reason I raised that question is, do we have additional properties support for "Send"mode or in future to have output plugins to push data onto Document or Timeseries datastores as flow data corresponds to a specific time snapshot.

At the moment, you'll probably have to use one of the options you listed. It seems like to be able to push the data directly to the OpenSearch store the JSON output plugin would have to support sending data to a specific HTTP endpoint. This is not something that is currently supported, but I could imagine the ability to do so being added in the future.

SankarSadasivam commented 5 months ago

Thanks Sedmicha. Plugins to push data onto specific datastores would be really helpful. Mainly towards document DBs like Open Search, Elastic Search and time series DBs like Prometheus, Influx DB, Victoria Metrics.

If I have to add items to the backlog, then above mentioned storage aspects would be my most prioritised items in the backlog.

SankarSadasivam commented 5 months ago

Hi, We were able to consume the flows and able to push it to datastores as well. 1 thing I noticed is, we don't get the actual host which is generating the flows. Is there any setting on the input plugin (udp) using which we can get the host/device as an attribute which generates the flows?

Lukas955 commented 5 months ago

Hi,

information about source (e.g. IP address, ODID, ...) are added to the record if detailedInfo option of the JSON plugin is enabled.

Try to add it to your startup configuration file:

<output>
    <name>JSON output</name>
    <plugin>json</plugin>
    <params>
        ...
        <detailedInfo>true</detailedInfo>   <!-- set this option to true -->
        ...
    </params>
</output>

Lukas

SankarSadasivam commented 5 months ago

Works Lukas, Thanks for your quick help.

CESNET / ipfixcol2

What is the Maximum Ingestion rate that is acceptable by IPFIXCol2? #95