CESNET / ipfixcol

IPFIXcol is an implementation of an IPFIX (RFC 7011) collector
Other
64 stars 37 forks source link

Question - How to get the ipfix filewriter to rotate #162

Closed jepries closed 6 years ago

jepries commented 8 years ago

I did some capacity tests writing out received packets via JSON and via ipfix file. The ipfix native format is way better, don't start losing data until > 15K packets per second, whereas I start losing data using json output at 4k packets per second.

No worries, I can store to ipfix and post process the ipfix file to json later. Question is, how do I get it to rotate out the ipfix files (or do I need to make my own plugin to do that)?

Thanks guys, this code is awesome, really making it work for me.

jepries commented 8 years ago

I'll answer my own question :).... I wrote a ipfix output module (shamelessly combined json code and ipfix storage code) called ripfix (for rotate ipfix). Works.... but... the output ipfix files that come later don't contain the templates, so passing the files through ipfixcol again using ipfix input and json output, json doesn't have the templates yet to parse the data.. so I lose records in json....

New question: Is there a way I can seed the new IPFIX file with all known templates before I hand off writing new packets to it? I looked through template_manager and it wasn't obvious to me.

Not sure how else to increase the speed, I guess I have to look at fastbit, but then have to convert it to json.

jepries commented 8 years ago

Taking the resulting rotated IPFIX files and netcat'ing them through a TCP ipfixcol to JSON listener will convert them to JSON offline, but that gets behind as it is still single threaded, can't split the load and try to catch up with the incoming files.... the ipfix files aren't self contained, they need the known templates in them. Plus, running ipfix files through, I'm getting tons of "invalid sequence numbers", it looks like sequence numbers aren't getting tracked by exporter... example:'

DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [261] Processing data
WARNING: preprocessor: [259] Sequence number error; expected 499086, got 13430
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Sequence number error; expected 13432, got 499086
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [261] Processing data
WARNING: preprocessor: [259] Sequence number error; expected 499208, got 13432 
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Sequence number error; expected 13434, got 499208

I guess my only hope if I want to keep json output is to try and multithread the conversion code in the JSON module.... ran into a dead end on the ipfix files.

I could probably reconstitute the binary of a template from ipfix_template structure, but that seems like a lot of work..... Still a decent option though, cuz running ipfix files through ipfixcol storing as json works, if I could just run more than one to keep up with the data. If you guys have ideas on how I could write the known templates (or even the associated templates and just keep track which ones ive written for each ipfix file), that would be great. Otherwise, I'm going to multithread the json processing from ipfix_message to json string.

jepries commented 8 years ago

I've tried fastbit storage as well, start losing data at 10k/sec, better than json, i'll try to get json to multithread the conversion and see how it goes. The complexity of fastbit may not be worth it but is usable. with the PfR product, there are many and varied different types of templates, so querying out of a database could be a complex undertaking. I like storing the ipfix file format in a rotating file still, just have to figure out how to deal with templates, that gives us the fastest to disk storage of data without loss of packets and an easy conversion to JSON (the best plugin ever), just need it to be faster :( ....

thorgrin commented 8 years ago

The main problem with JSON is that it does a heavy binary->string conversion. I've tried a few tricks to speed things up, however, the difference is quite small. The only thing that helped a bit is now committed to devel. That said, you can try to disable some conversions if you do not need them. The timestamp conversion takes quite a lot of time for one.

I've also found that writing each record separately is not as good as buffering larger chunks, but on the other hand, I'm not that comfortable sending 30k packets for larger messages either. Anyway, the difference is negligible when writing to file.

I do not know what is your ultimate goal. Is it storing the rotated JSON files? If so, do you need all the traffic for single time window in a single file? If not, you can distribute to multiple collectors using the forwarding plugin.

As for the sequence numbers, we really did not expect a single source to send multiple ODIDs. Each ODID has its own sequence numbers, however, the collectors stores single sequence number for the exporter, which causes the warnings that you observe. We'll keep that in mind while designing new version of the collector.

jepries commented 8 years ago

Thanks for the thoughts Petr, I thought about the forwarder and multiple plugins, that led me to Pen, I did some brief testing last night and I think Pen in front of multiple collectors could spread the load a bit. For json, has anyone tried, that you know of, multi threading the binary to string conversion method inside Storage.cpp? I figured thast is where the CPU time is and spreading the load there could process more data and prevent loss.

thorgrin commented 8 years ago

I do not think that PEN would solve the problem with templates, but the forwarding plugin handles that. You can set up a single forwarding collector to any number of json converting/writing collectors. The only problem is be that your data would be distributed across several files. If that is not an issue, I'd go with this approach since all the you have to do is to create a few config files to achieve this.

As for making the JSON plugin multi threaded, that brings several nasty issues, such as ordering of the data, passing the data between threads and so on. There is still a lot to improve in the current code before turning to multithreading. Please try the current version, it should be significantly faster than the previous one.

jepries commented 8 years ago

That's good advice, thanks Petr. Don't want to go down the path of updating JSON code to multi-threading just to hit another dead end :). I'm definitely going to use the new JSON in devel, it will be tomorrow probably when I load test it again. I have a pcap with 27,000+ packets in it I use to send in, then count the results. When I start getting fewer results than at 1000 packets per second, I know I'm losing data :). I'll report what my new capacity is.

If you have the time, please take a look at PEN again and let me know what you think, You can turn off "round robin" load balancing and it will stick an incoming exporter to a specific destination forever. I very briefly tested this in osx last night and it appears to do just that. I'm thinking that once PEN sees a particular exporter and chooses the destination, everything coming from that exporter will go to that destination. Even after restarts! It even allows for Direct to Client (which we don't need of course) communication which I think will pass through the packet unmolested. This I have to test as there is a requirement to use the odip plugin and store to the output the IP of the exporter.

As for the forwarding plugin, I'm starting to see how that would work. Forward everything then filter at individual actual receivers. The only issue I have is that I'm processing templates multiple times chewing up my cpu, but i'll be forwarding IPFIX after the conversion from NFv9, so may be cool, but didn't want to chew up my cpu, that is why I've looked at PEN. I'll report back if PEN works, if not, i'll report back how Forwarding works. Have lots to do :).

jepries commented 8 years ago

Thanks for the changes Petr, I see about 1000 more packets per second throughput before I start losing packets. Before I would lose packets at 4K pps, now I start losing packets at 5K. Tomorrow I'll put PEN in front and see if PEN slows me down any. My issue will be if 1 router sends more than 4K pps (can't imagine it will, but we'll see) then I'll have a problem.

thorgrin commented 8 years ago

OK. Let me know whether the PEN works for you. I guess that it might help other people to scale as well. By the way, how many packets per second (and flows per second) do you observe in your network? 5000 packets/s should suffice for a pretty large network (with appropriate timeouts).

What is your target throughput anyway? There is still the possibility to optimize the code some more, but that would require greater changes. I need to know whether it makes sense to dive into it, since it will not scale as well as load distribution across multiple collectors anyway.

jepries commented 8 years ago

I've been told that 8K pps is what to expect, but I won't know for sure until I see it. I'm actually working two projects, one for PfR which have a wide variety of templates (you have a pcap of that and ipfixcol is perfect for) and another for traditional flow data, but very short duration type flows. They both estimate 8K pps.... For a third project, I eventually want to replace a SiLK installation that is currently collecting 1.2B flows per day, or 14K pps (max, I know it should be less) The network for the traditional flows is very large indeed :)

jepries commented 8 years ago

so far, no luck with PEN.... https://github.com/UlricE/pen/issues/26

By all accounts, what I'm doing is not difficult, PEN should handle it.

Next up... ipfixcol in front of pool of ipfixcol's

jepries commented 8 years ago

Forwarder isn't formatting templates so that downstream ipfixcol's can parse

UPDATE 4: I turned off the odip plugin at the forwarder, this did not fix the template forwarding. Whenever a template is forwarded, the receiving ipfixcol throws the Packet header is incomplete error, the receivers still complain about not getting any templates. I thought maybe the bug is introduced by the odip, but it isn't. Once the forwarding of templates is resolved, I think this is my most workable solution. PEN didn't pan out, it doesn't treat UDP like I think it should and to allow for transparent forwarding there are too many iptables and rules that need applied making the solution less desirable than your forwarding module :)

update: debug on one of the collectors at 51001 shows that it isn't getting templates, and once in a while, errors:

DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
WARNING: UDP input: Packet header is incomplete; skipping message...
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found

So no data gets written as the templates never get sent....

UPDATE 2: Watching 1 of the pair of json writers, (even before templates received, round robin _is_working properly), then, it gets every other one. But, the first couple look like they are corrupted. Here is the log right around the time whtn template 259 and 261 come in at the forwarder:

WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: ipfix_message: [261] Processing data
WARNING: preprocessor: [261] Data template with ID 256 not found
DEBUG: ipfix_message: [259] Processing data
INFO: preprocessor: [259] New template ID 256
DEBUG: ipfix_message: [261] Processing data
INFO: preprocessor: [261] New template ID 256
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data

Here is what that same log looks like at one of the json writers:

DEBUG: ipfix_message: [259] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [261] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
WARNING: UDP input: Packet header is incomplete; skipping message...
WARNING: UDP input: Packet header is incomplete; skipping message...
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data 
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!

Somehow, the template isn't getting processed, I think the Packet Header Incomplete is our clue, the template isn't getting sent correctly. Also, before that, no templates came in so I don't think the message was getting past the preprocessor.

UPDATE 3: By the way, I'm replaying the same pcap file I sent you, has about 27K packets in it for this testing.

ORIGINAL MESSAGE:

@thorgrin Petr, I think I'm really close! I have 3 ipfixcol's running, their configs are below. I can't seem to figure out the forward quite perfectly. Basically

ipfixcol listening on 4739, round robin forward to 51001 and 50000 Ipfixcol listening on 50000 with json output to 51000 directory ipfixcol listening on 50001 with json output to 51001 directory

The ipfixcol's on 50000 and 50001 create their json files but never get populated with any data :(

[user@enmv-dev002-1 51000]$ ls -lt
total 0
-rw-r--r--. 1 v795363 enmv-dev 0 Mar 15 14:24 json.201603151824
-rw-r--r--. 1 v795363 enmv-dev 0 Mar 15 14:23 json.201603151823

Same goes for 51001.

What am I doing wrong, I'm sooooooo close!

Here is the forwarder config:

<?xml version="1.0" encoding="UTF-8"?>
<ipfix xmlns="urn:ietf:params:xml:ns:yang:ietf-ipfix-psamp">

    <!--## Every collecting process will be started as new process -->
    <collectingProcess>
            <!--## Arbitrary collecting process name  -->
            <name>UDP collector</name>
            <!--## Type of the collector. Supported types are defined in internalcfg.xml -->
            <udpCollector>
                    <!--## Arbitrary udp collector name -->
                    <name>Listening port 4739</name>
                    <!--## Local listening port -->
                    <localPort>4739</localPort>
                    <!--## Template lifetime in seconds -->
                    <templateLifeTime>1800</templateLifeTime>
                    <!--## Options template lifetime in seconds -->
                    <optionsTemplateLifeTime>1800</optionsTemplateLifeTime>
                    <!--## Template lifetime in packets (for how many packets is template valid) -->
                    <!-- <templateLifePacket>5</templateLifePacket>  -->
                    <!--## Options template lifetime in packets -->
                    <!-- <optionsTemplateLifePacket>100</optionsTemplateLifePacket>  -->
                    <!--## Local address to listen on. If empty, bind to all interfaces -->
                    <localIPAddress></localIPAddress>
            </udpCollector>
            <!--## Name of the exporting process. Must match exporting process name -->
            <exportingProcess>forwarder</exportingProcess>
            <!--## File for exporting status information to (combined with -S) -->
            <statisticsFile>/home/v795363/log/ipfixcol_stat.log</statisticsFile>
    </collectingProcess>

    <!--## Exporting process configuration -->
    <exportingProcess>
            <!--## Name of the exporting process, must match <exportingProcess> element
                       in <collectingProcess> -->
            <name>forwarder</name>
            <singleManager>yes</singleManager>
            <destination>
                    <name>All ODIID</name>
                    <fileWriter>
                            <defaultPort>49999</defaultPort>
                            <protocol>UDP</protocol>
                            <fileFormat>forwarding</fileFormat>
                            <distribution>RoundRobin</distribution>
                            <templateLifetime>1800</templateLifetime>
                            <optionsTemplateLifetime>1800</optionsTemplateLifetime>
                            <destination>
                                    <ip>10.73.198.150</ip>
                                    <port>51000</port>
                            </destination>
                            <destination>
                                    <ip>10.73.198.150</ip>
                                    <port>51001</port>
                            </destination>
                    </fileWriter>
            </destination>
    </exportingProcess>

    <!-- List of active Intermediate Plugins -->
    <intermediatePlugins>
            <odip>
            </odip>
    </intermediatePlugins>
</ipfix>

Here is one of the two collectors listening.... difference between two is listening port and destination directory

<?xml version="1.0" encoding="UTF-8"?>
<ipfix xmlns="urn:ietf:params:xml:ns:yang:ietf-ipfix-psamp">

    <!--## Every collecting process will be started as new process -->
    <collectingProcess>
            <!--## Arbitrary collecting process name  -->
            <name>UDP collector</name>
            <!--## Type of the collector. Supported types are defined in internalcfg.xml -->
            <udpCollector>
                    <!--## Arbitrary udp collector name -->
                    <name>Listening port 51000</name>
                    <!--## Local listening port -->
                    <localPort>51000</localPort>
                    <!--## Template lifetime in seconds -->
                    <templateLifeTime>1800</templateLifeTime>
                    <!--## Options template lifetime in seconds -->
                    <optionsTemplateLifeTime>1800</optionsTemplateLifeTime>
                    <!--## Template lifetime in packets (for how many packets is template valid) -->
                    <!-- <templateLifePacket>5</templateLifePacket>  -->
                    <!--## Options template lifetime in packets -->
                    <!-- <optionsTemplateLifePacket>100</optionsTemplateLifePacket>  -->
                    <!--## Local address to listen on. If empty, bind to all interfaces -->
                    <localIPAddress></localIPAddress>
            </udpCollector>
            <!--## Name of the exporting process. Must match exporting process name -->
            <!-- exportingProcess>File writer UDP Rotate</exportingProcess -->
            <!-- exportingProcess>File writer UDP Rotate</exportingProcess -->
            <exportingProcess>JSON writer UDP</exportingProcess>
            <!--## File for exporting status information to (combined with -S) -->
            <statisticsFile>/home/v795363/log/ipfixcol_stat.log</statisticsFile>
    </collectingProcess>

    <exportingProcess>
      <name>JSON writer UDP</name>
      <singleManager>yes</singleManager>
      <destination>
          <name>all ODID</name>
          <fileWriter>
             <fileFormat>json</fileFormat>
             <metadata>no</metadata>
             <tcpFlags>formated</tcpFlags>
             <timestamp>epoc</timestamp>
             <protocol>formated</protocol>
             <ignoreUnknown>no</ignoreUnknown>
             <nonPrintableChar>yes</nonPrintableChar>
             <output>
               <type>file</type>
               <path>/home/v795363/flows/51000/</path>
               <prefix>json.</prefix>
               <dumpInterval>
                 <timeWindow>60</timeWindow>
                 <timeAlignment>yes</timeAlignment>
               </dumpInterval>
             </output>
           </fileWriter>
         </destination>
    </exportingProcess>

    <!-- List of active Intermediate Plugins -->
    <intermediatePlugins>
            <!-- Dummy Intermediate Plugin - does nothing -->
            <dummy_ip>
            </dummy_ip>
    </intermediatePlugins>
</ipfix>
jepries commented 8 years ago

@thorgrin Hi Petr, the above post is really long, thought I would summarize here.

The forwarder module looks like it is the best choice, however, it has a bug forwarding templates. Downstream ipfixcol's cannot parse the templates when the forwarder sends them, it throws "Packet header is incomplete" errors.

I am using the odip plugin at the forwarder, turning this off has no impact, the error is still thrown.

Very nice module though, I think it is going to be perfect once template forwarding is fixed.

Should I open a separate issue for this? I would be happy to.

Thanks for the suggestion.

thorgrin commented 8 years ago

@jepries I'll look into this as soon as I can. The forwarder should always send all templates to all devices, so there is definitely a problem there. I'll keep you posted.

Could you try all setting for the forwarder in the meantime? It would help to narrow the bug down even more. Thanks ;-)

jepries commented 8 years ago

@thorgrin Absolutely!

Didn't seem to make a difference, but I found another hint. The first template received by the forwarder seems to transmit OK and is received by the downstream ipfixcols... Both of them in All and RoundRobin.

Here the forwarder sent a template:

DEBUG: ipfix_message: [9] Processing data
INFO: preprocessor: [9] New template ID 256
DEBUG: forwarding storage: Adding template ID 256

And here both receivers accepted it:

*** From Collector 51000 ***
DEBUG: ipfix_message: [9] Processing data
INFO: preprocessor: [9] New template ID 256

*** From Collector 51001 ***
DEBUG: ipfix_message: [9] Processing data
INFO: preprocessor: [9] New template ID 256

Then later in the pcap, other templates came in, here is the forwarder receiving them:

DEBUG: ipfix_message: [259] Processing data
INFO: preprocessor: [259] New template ID 256
DEBUG: ipfix_message: [261] Processing data
INFO: preprocessor: [261] New template ID 256
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data

The warnings about templates not found go away on the forwarder.

Here is the receiving ipfixcol's debug, both receivers are the same:

DEBUG: ipfix_message: [259] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [261] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
WARNING: UDP input: Packet header is incomplete; skipping message...
WARNING: UDP input: Packet header is incomplete; skipping message...
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found

Aleast for the first template sent works.... wondering if that was the same behavior in round robin? checking.... The above is the same behavior with both "all" and "RoundRobin", the first template appears to have come in ok, the rest failed.

Thanks for looking at this.

jepries commented 8 years ago

@thorgrin just a side note, I'm collecting large pcaps for both the PfR and this normal flow data in NFv9. I'm going to collect 100K packets in each of the different feeds, I found tcpreplay to be invaluable in troubleshooting :) I'll send them to you when I have them completed, this afternoon, at 89K packets at the moment.

Thanks for the great work on this, it is a pretty cool piece of code.

thorgrin commented 8 years ago

This took some digging into, but I've probably found the problem. Everything is fine (except those damn sequence numbers) on the sample you provided earlier. Please give a latest devel branch a try.

jepries commented 8 years ago

@thorgrin I WILL!!!!! Thanks Petr! I'll try it out as soon as I can. may be a day or two. Thanks for fixing this!

jepries commented 8 years ago

@thorgrin Better, but not quite there yet. The template appears to be accepted, but not recognized by the forwardee's. Same setup as above, a forwarder accepting and sending to two receivers.

Here is the log of the forwarder, you can see the template incoming then after that no more "template not founds"

DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: ipfix_message: [261] Processing data   
WARNING: preprocessor: [261] Data template with ID 256 not found
DEBUG: ipfix_message: [259] Processing data
INFO: preprocessor: [259] New template ID 256
DEBUG: ipfix_message: [261] Processing data
INFO: preprocessor: [261] New template ID 256
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data
DEBUG: ipfix_message: [259] Processing data

Both the receivers are the same, here is the same section in time when the receivers receive the templates. You can see that after the template is received, only then are the warnings issued about "template not found"

DEBUG: ipfix_message: [259] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [261] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data                       <<-- TEMPLATE
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [261] Processing data                        <<-- TEMPLATE
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data 
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
WARNING: preprocessor: [259] Data template with ID 256 not found

No more invalid headers anyway....

One more hint, earlier in the capture, [9] sends in a template and it gets recognized

DEBUG: ipfix_message: [261] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [9] Processing data
INFO: preprocessor: [9] New template ID 256
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data

and it even says it recognizes it as a template, but when the others get forwarded, no joy. Also, no data stored in json either.

jepries commented 8 years ago

@thorgrin Another hint. It could be that in the flow I'm testing, I have the same exporter with multiple Source Id's (that last 4 bytes in the header). I have this other capture for PfR data that I attached here that seems to work much better. I think the templates for different source id's aren't getting handled somehow in the forwarder, but they do get handled nicely when printing to json directly.

Here is a PFR capture that works better. I had to change all the source IP's to get through some firewalls or something, so it looks like all the sources are the same but it is really different routers sending the packets. All lab, nothing real here ....you may have better luck, so have the original capture here:

Note, I haven't tested total record count from a straight up receiver versus a forwarder to round robin receivers. I'll do that tomorrow. But I at least get records through. All the source ID's are 0 I believe in this capture, unlike that other one where I have a half dozen source ID's all from the same IP Address.

PfR_capture.zip

thorgrin commented 8 years ago

@jepries Should the that you sent should work or not? It seems to be OK, apart from the sequence numbers.

When the forwarder does not have data for a template, It can send empty IPFIX packet (header only, no data), since it removes the data sets without templates. That might be the reason why you see the "DEBUG: ipfix_message: [259] Processing data" message without an error. Could you provide a short trace with the behaviour that causes you the missing template problems?

jepries commented 8 years ago

@thorgrin Sorry, I know it's confusing, I'm working on two different projects at same time. The capture file I provided above labelled PfR_capture seems to work ok. The other capture file I'm working with, pretty much straight up "flow" data, NFv9 flow data, doesn't work. The differences in the one I'm having trouble with (I can send it to you privately, don't want to post that one here), has what appears to be 3 or 4 unique "Source ID"s (9, 259, 261, etc), they all have their own templates that are the same id (all source id's seem to have a template id of 259). All of the source ID's are coming from the same exporter IP Address. This is what may be confusing the template forwarder, not sure.

The trace in my post above is what I was seeing.... The forwarder would send empty headers to the two receivers until a template came in. Confirmed. Then, early in the replay, Template 256 for Source ID 9 came in at the forwarder, and it was sent nicely to the receivers:

LOG TRACE RECEIVED AT THE RECIEVER, SENT BY THE FORWARDER

DEBUG: ipfix_message: [9] Processing data
INFO: preprocessor: [9] New template ID 256

Later, template 256 for Source ID's 259 and 261 come in. They are received by the forwarder:

LOG TRACE RECEIVED AT THE FORWARDER, SENT BY tcpreplay

DEBUG: ipfix_message: [259] Processing data
INFO: preprocessor: [259] New template ID 256
DEBUG: ipfix_message: [261] Processing data
INFO: preprocessor: [261] New template ID 256

These next lines are the corresponding log messages at the receiver. It seems that the receiver did not get template 256 for Source ID 259 or Source ID 261, but afterwards, the forwarder is now sending the data sets causing the receiver to warn out:

BOTH RECEIVERS LOG TRACE when TEMPLATE RECEIVED BY FORWARDER

DEBUG: ipfix_message: [259] Processing data                       <<-- TEMPLATE
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [261] Processing data                        <<-- TEMPLATE
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data 
WARNING: preprocessor: [259] Data template with ID 256 not found <<-- Forwarder now sending Data Sets
DEBUG: dummy Intermediate Process: got IPFIX message!
DEBUG: ipfix_message: [259] Processing data

Above, I marked the lines as TEMPLATE, those are the lines that pop up at the receiver at precisely the moment the forwarder receives the templates for Source ID 259 and 261.

What I think is happening is the forwarder has already sent template 256 for SourceID 9 and perhaps the forwarder code does not think it needs to send the new templates 256 for Source ID's 259 and 261 to the receivers. You can see that right after those templates are received, DataSets are now being sent to the receivers as they start complaining that template 256 isn't found, but the receivers never received Source ID 259 Template ID 256 in order to parse the data, so it warns it away.

I'll gladly send you my capture file that creates the issue above, but don't want to post it here. Send me a private note with where you would like me to send it and I'll attempt it. the capture file is 1.4M zipped.

thorgrin commented 8 years ago

@jepries Sorry for another delay, we've had Easter holiday here and I was not on the Internet till today. Your assessment of the situation is spot on. The collector itself uses source IP address and port of the exporter together with the ODID to identify source. Then, template numbers must be unique for each source. However, the forwarding plugin uses only template numbers to identify individual templates, which causes problems you observe.

I've just tried to use two different exporters with different ODIDs and the behaviour is the same. We really need to distinguish the templates in the forwarding plugin the same way as in the collector itself. I'll look what I can do about this. Thanks for your patience and analysis, you did some great work on that!

thorgrin commented 8 years ago

@jepries OK, I've just pushed an update that should take care of templates per ODID. I guess that you could have also disable the singleManager option and let the forwarder behave like multiple sources, but this is a better solution.

However, we still do not fully support same ODID for multiple exporters. Especially as the forwarder looks like a single exporter, there would be no way to distinguish the sources to the forwardee. We will handle this in the future, but it means a lot of changes in all places that handle ODIDs and input sources.

jepries commented 8 years ago

@thorgrin Thanks Petr! To have the source IP of the original exporter, I turned on odip in the forwarder then didn't configure it in the receiving collectors, that way I could still get that address of the original exporter :).

thanks Again!!!! I'll test this out just as soon as I can, I have a user that wants the features of this awesome work you guys have created here!

jepries commented 8 years ago

@thorgrin Petr, I re-ran the test above and didn't see a difference. If you want my file that I'm using, I can send it to you privately. I think you may already have it though. Pretty sure I got all my configs right. The behavior in the logs above seem to be the same. I have a forwarder sending to two receivers, Single manager is turned on for all three, odip is turned on in the forwarder. I double checked I'm using the newly compiled .so's.

The new code still seems to work on my other PfR exports, it is just that one tcpdump that has multiple odid's from same exporter that is the problem. This may be an artifact to how the lab was routing this data to me and it isn't actually real.

Following up on my post below, forwarding to two receivers seems to have less capacity than just processing it down to json directly. In the post below, I was able to use 1 ipfixcol directly to json and dial it up to 18Kpps before drops, when running through forwarder I had to drop back to 14Kpps to get the same record counts. Anyway, I think we can put this to bed, 18Kpps is pretty darn good. :)

BTW, that was for the larger PfR Flows, for some reason ipfixcol handled those better than straight up standard netflows, not sure why, maybe cuz the standard netflows packed more records into one packet than the PfR's do. I got 14Kpps out of the standard netflows directly to json, still awesome!

Thanks again!

jepries commented 8 years ago

@thorgrin By the way, Amazing work sir on increasing the efficiency of json writing. I'm running a 127,083 record capture file through a standard json file writer. I've measured the throughput as below and saved my json files. AMAZING! You've increased the throughput immensely, I may not have to deal with forwarding and such! Here are my Results. I started slow and worked up. When I hit a limit, I reduced overhead and kept going.

Start:

1000 pps with -v3 and in foreground = 127,083 records
2000 pps with -v3 and in foreground = 127,083 records
3000 pps with -v3 and in foreground = 127,083 records
4000 pps with -v3 and in foreground = 127,038 records <==Limit with DEBUG and Foreground
4000 pps in foreground = 127,083 records
5000 pps in foreground = 127,083 records
6000 pps in foreground = 127,083 records
7000 pps in foreground = 127,083 records
8000 pps in foreground = 127,083 records
9000 pps in foreground = 127,083 records
10000 pps in foreground = 127,083 records
11000 pps in foreground = 127,083 records
12000 pps in foreground = 127,083 records
13000 pps in foreground = 127,083 records
14000 pps in foreground = 127,083 records
15000 pps in foreground = 127,083 records
16000 pps in foreground = 127,075 records <== LIMIT OF FOREGROUND
16000 pps -d (background) = 127,083 records
17000 pps -d (background) = 127,083 records
18000 pps -d (background) = 127,034 records <== LIMIT of JSON WRITER (matches IPFIX)

Amazing work sir! :+1:

thorgrin commented 8 years ago

@jepries I have no Idea why it would not work for you. Please send me the data privately, I'll have a look for myself. I cannot find the data now, although I'm quite sure that I've had it in the past.

As for the speedup that you observed, I must say that I'm impressed. I did not manage to get near such throughput in my tests. How many records/s are we talking about? I usually process packets up to 3000B so that I can get a lot of application data in the flow records. Smaller records and different packets sizes will affect the throughput significantly. Did you change you hardware or anything at all since your previous throughput tests?

I'm also surprised that the forwarder is slower than the JSON plugin. It does an awful lot of copying of the data to construct correct packets and drop data without templates, but I did not think that it is worse than JSON. The forwarding plugin was one of the fastest in the past.

I'm glad that it works for you right now, however, I'd still like to fix the forwarding usecase as an alternative. I can work on optimizing the forwarder afterwards.

jepries commented 8 years ago

@thorgrin Hi Petr, sorry it has taken so long to respond. I will send you the captures, have some hardware issues to deal before I can do that, but i have a couple of very large captures i’ve been working with for the testing.

The testing setup was 2 4 cpu 8 gig memory VM’s running RHEL7. They had IP’s next to each other, I do not know if they were in the same host or not, just VM’s i the cloud. For the first test, it was PfR like records, the packets did not appear to have lots of records encapsulated, so maybe 1-2 records per packet is what I observed. The file, when captured, said there were 120K packets captured or so, so the records written through IPFIXcol and the number of packets were about the same.

There is another packet capture I have for regular flows, the one giving me so many problems with the forwarder. It contains 3 to 5 records per packet. When captured, tcpdump said it captured about 98K packets, but when run through IPFIXCol, I get upwards of 586K json records. ON this file, I did notice a slowdown, I think my limit was around 8 - 9 Kpps or so (the data is at work, just going by memory). I know your MMV with different flows, hardware setups, etc :).

Anyway, I'll get you these two files just as soon as I can extract them from the lab.

thorgrin commented 6 years ago

Better late then never. The IPFIX storage plugin now supports file rotation. It is in devel right now, but we will make a new release in the next few days.