Upgrade flow parsing - Githubissues

aouinizied commented 4 years ago

Is it possible to upgrade the flow parser to the latest nfstream version?

Pros:

SPLT analysis is implemented within nfstream in native code. No need for Plugin.
Parallelized across available machine cores (up to x10 speed gain).
Several issues fixed.

I'm keen to see what is the speed gain of such an upgrade to your overall workflow.

BR, Zied

RadionBik commented 4 years ago

Hi!

I looked into the new code and couldn't recognize it, looks like you have done tons of low-level optimizations :)

According to the new API, I haven't found support for some features I need, e.g. TCP-flags. I saw accounting_mode parameter, but it also is not clear if I can extract packet size and transport payload at the same time.

Do you think it is possible to extend the interface to allow access to arbitrary NFPacket attributes? For example, the accounting_mode could be replaced with something like extracted_packet_attributes that is an iterable with NFPacket specified attribute names. Thus, it will allow accounting for several raw features at the same time and the end user will have all possible options their might need.

What do you think? BR, Radion

aouinizied commented 4 years ago

Accounting mode affects only computed statistics (splt and post-mortem), When you have a Plugin he can access the same NFPacket attributes (which are not affected by accounting mode). https://www.nfstream.org/docs/api#nfpacket-object

TCP Flags are still there. And I added delta_time feature to facilitate time based Plugins.

So You have all NFPacket attributes. :)

aouinizied commented 4 years ago

@RadionBik That's why I think you can upgrade smoothly without even the need to have a Plugin.

RadionBik commented 4 years ago

I started adapting and noticed that one of the tests fails, since the new version exports one of the flows twice, although the active and passive timeouts remain the same. Is there something I haven't considered or this behaviour cannot be controlled?

aouinizied commented 4 years ago

@RadionBik what do you mean by "this behavior"?

active and idle timeout default values changed, so you can set it to your previously used values.
Which pcap file are you using for the failed test? What do you mean by exported twice? the same flow with the same counters or it is sliced based on idle timeout?

RadionBik commented 4 years ago

Below are the raw sequences with the same 5-tuple ('TCP 213.180.204.179:443 192.168.0.105:55194').

Exported by the latest version:
exported with v5.2.0:

As you see, I got 2 exported flows with the same identifier via the latest NFstream. And it is not expected, since the timeouts are set as before. I tested on the following .pcap: https://github.com/RadionBik/ML-based-network-traffic-classifier/blob/master/pcap_files/example.pcap

Now, I checked the IAT values, one of which is 60002 ms that is really close to 60 sec timeout I set. Looks like the previous version did not export the flow when it was necessary :)

RadionBik commented 4 years ago

After all, I updated flow parsing module with the latest NFstream. The pcap parsing tests on a small pcap runs twice faster now :) I haven't tested it on big ones though.

I pushed the commit to the branch I will merge soon: https://github.com/RadionBik/ML-based-network-traffic-classifier/pull/13

Thank you!

aouinizied commented 4 years ago

@RadionBik Yes, this is a bug the previous version that I fixed too. With n_meters=0 nfstream automatically scales on available cores on the machine. Did you use the native implementation of SPLT with splt_analysis=N?

RadionBik commented 4 years ago

@aouinizied yes, I used the native implementation that was complemented with a plugin in case some extended features are needed, see https://github.com/RadionBik/ML-based-network-traffic-classifier/pull/13/commits/7cb137234b0f2059c1b14e4e6252f6942afc78af

aouinizied commented 4 years ago

@RadionBik There two points:

I hesitated to combine ps and directions in one vector, however, I ended splitting it into two separate vectors for a simple reason. Imagine that at some time you have 0 packet_size (payload for example). If you do * -1 you will have 0 which raise issues on how to determine direction in such case.
You use DPKT for TCP flags extraction. They are already extracted as NFPacket attributes.

Maybe in Future I will add TCP window extraction in C. Keep in mind that using a Plugin triggers some mechanism that are not triggered without Plugins and thus implies some speed overhead.

Zied

RadionBik commented 4 years ago

@aouinizied

You are right. My assumption was that I would need only IP packet length. In case of using payload size instead, this data model will break, for sure. There must be a better approach than using .csv files, something supporting data typing and arbitrary sized arrays (e.g. protobuf dumps), but it is left for future work.
There used to be tcpflags attribute in NFPacket that served the purpose well. Now it is gone, and since I have been using dpkt for TCP window extraction, the integer TCP flag field can be get for no additional cost.

RadionBik / ML-based-network-traffic-classifier

Upgrade flow parsing #14