Ostrale / Blossom-Blosoft

Let's fight against digital pollution It is an free soft that will track your computer habits and data consumption.
CeCILL Free Software License Agreement v2.1
2 stars 0 forks source link

Enhance Detection Performance for Protocol Identification #9

Closed Ostrale closed 11 months ago

Ostrale commented 11 months ago

Enhance Detection Performance for Protocol Identification

Description:

The current protocol detection in the Blosoft codebase needs improvement for better performance and accuracy.

Current Test Data:

 ---------------------- Protocols ----------------------
Name                Packets        Bytes          Flows
DNS                 108            6018           25
Microsoft365        47             17971          4
Unknown             3792           4498306        48
YouTube             562            92357          17
QUIC                742            819145         5
TLS                 70             10710          8
Skype_Teams         36             17236          7
WSD                 14             9716           2
 ------------------- End of Protocols -------------------

Desired Output:

 ---------------------- Detected Protocols ----------------------
Name                Packets        Bytes          Flows
DNS                 47             3481           6
HTTP                2              161            1
TLS                 81             12537          10
ICMPv6              2              164            1
YouTube             5132           5483771        17
Skype_Teams         36             17740          4
WSD                 14             9912           2
Microsoft365        59             19441          3
 ------------------- End of Detected Protocols -------------------

Tasks:

Optimize Protocol Detection Algorithm: Review and enhance the algorithm responsible for protocol detection to improve accuracy and efficiency.

Ostrale commented 11 months ago

Hi there! I've been looking into the protocol detection algorithm, and I've identified a potential issue. When analyzing the file t2.pcap, which contains only a single packet, here's what I'm supposed to obtain:

Original Packet: 08 00 27 55 0c 22 9c eb e8 a5 e6 7d 08 00 45 00 00 34 eb 65 40 00 80 06 00 00 c0 a8 0a 7a c0 a8 0a 01 ee 02 00 35 3d be 40 bb 00 00 00 00 80 02 fa f0 95 f2 00 00 02 04 05 b4 01 03 03 08 01 01 04 02

However, with my current code and the libtins library, here's what I'm getting:

Modified Packet: 45 00 00 34 eb 65 40 00 80 06 79 92 c0 a8 0a 7a c0 a8 0a 01 ee 02 00 35 3d be 40 bb 00 00 00 00 80 02 fa f0 71 a2 00 00 02 04 05 b4 01 03 03 08 01 01 04 02

Some numbers are different (highlighting the offset) – see the differences in bold.

I believe this could be impacting the accuracy of the protocol detection. I'll continue investigating and work on optimizing the algorithm to address this issue.

45 00 00 34 eb 65 40 00 80 06 **00** **00** c0 a8 0a 7a c0 a8 0a 01 ee 02 00 35 3d be 40 bb 00 00 00 00 80 02 fa f0 **95** **f2** 00 00 02 04 05 b4 01 03 03 08 01 01 04 02
45 00 00 34 eb 65 40 00 80 06 **79** **92** c0 a8 0a 7a c0 a8 0a 01 ee 02 00 35 3d be 40 bb 00 00 00 00 80 02 fa f0 **71** **a2** 00 00 02 04 05 b4 01 03 03 08 01 01 04 02 

Edit :

the first is: ip.checksum the second is: tcp.checksum

This probably has no impact on detection, both should be valid propositions.

Ostrale commented 11 months ago

I've been working on the flow_heal_check function, and it seems to prematurely terminate flows that should still be active. By making some adjustments (removing the premature termination of flows), here's the improvement I observed:

---------------------- Detected Protocols ----------------------
Name                Packets        Bytes          Flows
Microsoft365        59             18561          3
DNS                 47             2751           6
Unknown             13             778            7
TLS                 70             10710          4
Skype_Teams         36             17236          4
YouTube             5132           5411707        17
WSD                 14             9716           2
------------------- End of Detected Protocols -------------------

However, there are still some points to address:

| Name      | Packets (desired) | Packets | Flows (desired) | Flows | Dif Packets | Dif Flows |
|-----------|-------------------|---------|------------------|-------|-------------|-----------|
| HTTP      | 2                 | 0       | 1                | 0     | 2           | 1         |
| TLS       | 81                | 70      | 10               | 4     | 11          | 6         |

No HTTP packets were detected, and there are some missing TLS packets. Additionally, the byte calculation seems to be incorrect.

Improvements are needed, including a review of the conditions for flow termination.

Ostrale commented 11 months ago

By adding the ndpi_detection_giveup function, we obtain the reliable results.

In addition, the size in bytes of the packets has been corrected.

It will remain to add ICMPv6

I close this issue.

Ostrale commented 11 months ago

By adding the ndpi_detection_giveup function, we obtain the good results.

In addition, the size in bytes of the packets has been corrected.

It will remain to add ICMPv6

I close this issue