Closed Yesisyes closed 1 month ago
And I have one more question. For the same pcap file, there are two results of Src IP,Dst IP analyzed by CICFlowMeter. However, there is only one result for Src IP,Dst IP analyzed by NTLFlowLyzer. Why does this happen? Attached below are screenshots of my tests and test files. test_result.zip
Hi @Yesisyes, Thank you for reaching out and for your engagement with NTLFlowLyzer.
Regarding the UDP protocol analysis, it is currently not supported in NTLFlowLyzer, although it is on our future roadmap. Initially, we included UDP analysis similar to CICFlowMeter. However, after further investigation, we realized that defining UDP flows in a manner similar to TCP is not ideal since UDP is not connection-based. This led to the decision to pause UDP analysis for now. Rest assured, we are actively working on adding comprehensive UDP support in future versions.
Concerning the differences in the outputs between NTLFlowLyzer and CICFlowMeter, this discrepancy arises from how each tool handles flow termination. In NTLFlowLyzer, a flow is closed based on the following conditions:
In your provided CSV files and as can be seen in the following screenshots, the first flow is closed upon receiving the second FIN flag (one in each direction), and the second flow is closed upon detecting an RST flag.
CICFlowMeter also closes flows based on detecting two FIN flags or one RST flag, as indicated in their source file.
However, in the CSV file generated by CICFlowMeter there is no RST flag and there is only one FIN flag! So there might be other underlying issues causing the discrepancies you observed.
After analyzing your attached pcap file, it appears there are three packets with FIN flags and one with RST. In the output from CICFlowMeter, only one FIN flag is shown, suggesting there might be issues in its packet or flag reading process. In contrast, NTLFlowLyzer correctly detects all three FIN flags. Here are the relevant screenshots:
I hope this detailed explanation clarifies the differences and addresses your questions. We appreciate your understanding and support as we continue to improve NTLFlowLyzer. Your feedback is invaluable to us, and we are committed to enhancing the tool based on user experiences and suggestions.
Thank you once again for your engagement.
Hi @Yesisyes, Thank you for reaching out and for your engagement with NTLFlowLyzer.
UDP Protocol Analysis
Regarding the UDP protocol analysis, it is currently not supported in NTLFlowLyzer, although it is on our future roadmap. Initially, we included UDP analysis similar to CICFlowMeter. However, after further investigation, we realized that defining UDP flows in a manner similar to TCP is not ideal since UDP is not connection-based. This led to the decision to pause UDP analysis for now. Rest assured, we are actively working on adding comprehensive UDP support in future versions.
Differences in Flow Output Between NTLFlowLyzer and CICFlowMeter
Concerning the differences in the outputs between NTLFlowLyzer and CICFlowMeter, this discrepancy arises from how each tool handles flow termination. In NTLFlowLyzer, a flow is closed based on the following conditions:
- Two FIN Flags: Detection of two FIN flags, one in each direction.
- One RST Flag: Detection of an RST flag.
- Flow Timeout: Reaching the maximum flow timeout.
- Inactivity Timeout: Flow inactivity for a specified period.
In your provided CSV files and as can be seen in the following screenshots, the first flow is closed upon receiving the second FIN flag (one in each direction), and the second flow is closed upon detecting an RST flag.
CICFlowMeter also closes flows based on detecting two FIN flags or one RST flag, as indicated in their source file.
However, in the CSV file generated by CICFlowMeter there is no RST flag and there is only one FIN flag! So there might be other underlying issues causing the discrepancies you observed.
After analyzing your attached pcap file, it appears there are three packets with FIN flags and one with RST. In the output from CICFlowMeter, only one FIN flag is shown, suggesting there might be issues in its packet or flag reading process. In contrast, NTLFlowLyzer correctly detects all three FIN flags. Here are the relevant screenshots:
I hope this detailed explanation clarifies the differences and addresses your questions. We appreciate your understanding and support as we continue to improve NTLFlowLyzer. Your feedback is invaluable to us, and we are committed to enhancing the tool based on user experiences and suggestions.
Thank you once again for your engagement.
Hi, @moein-shafi Thank you very much for your clarification. At first I wondered why the CICFlowMeter detected two Src IP's. This is because the test.pcap file was saved by using wireshark to trace the tcp stream. After reading your explanation, I can confirm that the result of "CICFlowMeter" is wrong. Is NTLFlowLyzer more accurate than CICFlowMeter on TCP?
Hi @Yesisyes, Thank you for reaching out and for your engagement with NTLFlowLyzer.
UDP Protocol Analysis
Regarding the UDP protocol analysis, it is currently not supported in NTLFlowLyzer, although it is on our future roadmap. Initially, we included UDP analysis similar to CICFlowMeter. However, after further investigation, we realized that defining UDP flows in a manner similar to TCP is not ideal since UDP is not connection-based. This led to the decision to pause UDP analysis for now. Rest assured, we are actively working on adding comprehensive UDP support in future versions.
Differences in Flow Output Between NTLFlowLyzer and CICFlowMeter
Concerning the differences in the outputs between NTLFlowLyzer and CICFlowMeter, this discrepancy arises from how each tool handles flow termination. In NTLFlowLyzer, a flow is closed based on the following conditions:
- Two FIN Flags: Detection of two FIN flags, one in each direction.
- One RST Flag: Detection of an RST flag.
- Flow Timeout: Reaching the maximum flow timeout.
- Inactivity Timeout: Flow inactivity for a specified period.
In your provided CSV files and as can be seen in the following screenshots, the first flow is closed upon receiving the second FIN flag (one in each direction), and the second flow is closed upon detecting an RST flag.
CICFlowMeter also closes flows based on detecting two FIN flags or one RST flag, as indicated in their source file.
However, in the CSV file generated by CICFlowMeter there is no RST flag and there is only one FIN flag! So there might be other underlying issues causing the discrepancies you observed.
After analyzing your attached pcap file, it appears there are three packets with FIN flags and one with RST. In the output from CICFlowMeter, only one FIN flag is shown, suggesting there might be issues in its packet or flag reading process. In contrast, NTLFlowLyzer correctly detects all three FIN flags. Here are the relevant screenshots:
I hope this detailed explanation clarifies the differences and addresses your questions. We appreciate your understanding and support as we continue to improve NTLFlowLyzer. Your feedback is invaluable to us, and we are committed to enhancing the tool based on user experiences and suggestions.
Thank you once again for your engagement.
Hi, @moein-shafi Thank you very much for your clarification. At first I wondered why the CICFlowMeter detected two Src IP's. This is because the test.pcap file was saved by using wireshark to trace the tcp stream. After reading your explanation, I can confirm that the result of "CICFlowMeter" is wrong. Is NTLFlowLyzer more accurate than CICFlowMeter on TCP?
I know NTLFlowLyzer from this Issues. https://github.com/ahlashkari/CICFlowMeter/issues/118 Because of this error: TypeError: conversion from numpy.int64 to Decimal is not supported. That could just be a problem with type conversion inside the CICFlowMeter source code. CICFlowMeter doesn't seem to have fixed this bug yet. I had to use a visualize.bat program that someone else had created from the CICFlowMeter code.This program generated the CICFlowMeter_test.csv file I shared. But that program often crashes or makes errors.
Hi @Yesisyes ,
Thank you for your kind feedback
In developing NTLFlowLyzer, We have strived to not only reproduce the core functionalities of CICFlowMeter but also to refine and advance its existing limitations. This has involved reviewing CICFlowMeter’s open issues on GitHub and other sources, and incorporating those insights into our tool. We have also worked to extend its feature set and functionalities based on these observations.
While we strive to offer a more accurate and robust TCP flow analysis, we recognize that the effectiveness of any tool can vary depending on specific use cases and configurations. We are dedicated to continually enhancing NTLFlowLyzer to ensure it meets high standards of accuracy and reliability.
Below is a list of the most significant improvements made over CICFlowMeter. We will update the README to include these enhancements for greater clarity and reference.
CICFlowMeter is a previous tool that was developed in 2017 using the Java programming language. While the tool effectively extracted features from network traffic, our work seeks to improve upon it by addressing issues identified in its implementation and theoretical foundations. The primary motivation for developing our tool, NTLFlowLyzer, is to provide a more efficient and accurate means of extracting valuable features from network traffic, overcoming the limitations of CICFlowMeter.
In this part, we will discuss the issues with CICFlowMeter in detail and describe the improvements we made in our implementation. The following is a list of issues we identified with CICFlowMeter and how we addressed them in NTLFlowLyzer:
Flow Creation
A critical issue with CICFlowMeter was its flow definition. CICFlowMeter's flow definition was based on the source IP, source port, destination IP, destination port, and protocol. However, in NTLFlowLyzer, we also consider the timestamp for the flow ID, resulting in a more precise flow definition. This improvement in flow definition leads to more accurate flow identification and better analysis results.
Additionally, the number of created flows in CICFlowMeter differs from that in NTLFlowLyzer. This discrepancy can be attributed to the different flow definitions used by the two tools. The more accurate flow definition used in NTLFlowLyzer allows for the creation of more precise flows, leading to more accurate analysis results.
CICFlowMeter Performance
During our evaluation of CICFlowMeter, we encountered several performance issues when working with large pcap files. Specifically, we found that the tool's processing speed decreased significantly when handling actual network traffic data (with millions of TCP packets in one pcap file). To address this challenge, we opted to use DPKT
, a high-performance network data library, in conjunction with multi-processing techniques to improve the tool's efficiency.
Our objective was to enable researchers to use the NTLFlowLyzer tool with large pcap files, such as those included in the real- world dataset, which can exceed 8 GB and contain approximately 10 million packets, without experiencing any noticeable slowdowns. This way, users can perform practical network analysis and obtain accurate results on their regular systems and laptops.
Creating Empty CSV Files for Specific Pcap Files
All TCP packets in the pcap file will be considered in the new implementation. If there are no TCP packets in the file, NTLFlowLyzer will display a message to inform the user that the pcap file lacks TCP packets.
No Flow in Malformed Pcap Files
To improve the handling of malformed pcap files in NTLFlowLyzer, we have implemented a more robust approach than CICFlowMeter. While CICFlowMeter often considered malformed pcap files invalid and created 0 flows, our tool considers all packets in the pcap file. In attack scenarios, there might be malformed or broken packets. Therefore, in NTLFlowLyzer, we implemented an exception-handling mechanism. If a packet is incorrect, we raise an exception and continue reading all other packets. This approach ensures that NTLFlowLyzer can effectively extract valuable features even from attack pcap files.
Unnecessary and Time-Consuming Features for Specific Pcap Files
To enhance the efficiency and flexibility of feature extraction in NTLFlowLyzer, modifications were made to the code to allow for customization of the feature selection process. The user can specify a list of features to be ignored, avoiding the unnecessary and time-consuming extraction of irrelevant features for specific pcap files.
Installation on Windows and Linux
NTLFlowLyzer includes clear installation instructions for both Windows and Linux platforms, which need to be improved in CICFlowMeter. Furthermore, we have provided a Dockerfile, allowing users to deploy NTLFlowLyzer in a containerized environment quickly. This provides several benefits, such as ensuring that the tool runs in a consistent and isolated background, making it easier to manage dependencies and upgrades, and simplifying the installation process for users. Additionally, containerization can improve security by isolating the tool from the host operating system and other applications.
In addition to the installation instructions, we also provided a Python package for NTLFlowLyzer. Python packaging offers several benefits in our case. Firstly, it simplifies the installation process and automatically installs all necessary dependencies. This is particularly useful for less experienced users with installing and configuring software. Secondly, it allows for easy distribution and sharing of the software across different platforms and systems. Thirdly, it enables version control and updates, allowing users to easily upgrade to the latest version of the software. Finally, it promotes modularity and code reusability, making it easier to maintain and extend the software in the future.
Adding Payload Bytes of the First Packet to FlowLengthStats Twice
The issue of double-counting the payload bytes of the first packet in flowLengthStats
in CICFlowMeter has been rectified in NTLFlowLyzer. NTLFlowLyzer now correctly calculates flowLengthStats by not including the payload bytes of the first packet twice.
Loading Network Interfaces Issues on Debian 10
NTLFlowLyzer provides a straightforward installation process through clear instructions, a Dockerfile, and a Python package, making it easier to install on various operating systems, including Debian 10 and Ubuntu 22.04. By providing a Dockerfile, users can easily set up the environment without worrying about compatibility issues. At the same time, the Python package simplifies installation and allows for easy integration with other Python-based tools.
The NTLFlowLyzer team tested the installation process on Windows 10 and 11, and Ubuntu 20.04 and 22.04 and verified that it works correctly. The Dockerfile and Python package also makes it easier to install and use NTLFlowLyzer on other Linux distributions and operating systems, such as Windows and macOS.
PSH Flag Issue
The PSH flag function in CICFlowMeter failed to increment, leading to inaccurate results. This issue was fixed in NTLFlowLyzer, where the PSH flag function increments correctly, ensuring the accuracy of the results.
Down/Up Ratio Issue
In CICFlowMeter, the value of the down/up ratio
was sometimes cast to an integer number, resulting in the loss of precision. This issue was resolved in NTLFlowLyzer, where the ratio is stored as a floating-point number, ensuring that the precision of the ratio is maintained.
ICMP Protocol Issue
In CICFlowMeter, there was a problem where the ICMP protocol was given a value of 0 instead of 1, which was corrected in NTLFlowLyzer. In NTLFlowLyzer, the correct protocol number is assigned to ICMP packets.
Negative Values in IAT Statistics
CICFlowMeter had a problem where negative values were present in the Inter-arrival time (IAT)
statistics. This issue was corrected in NTLFlowLyzer, which now ensures that IAT statistics only contain non-negative values. This correction allows for a more accurate traffic flow analysis and avoids any potential errors in statistical calculations.
Various reasons, including packet timestamps or reordering issues, can cause negative values in IAT statistics. These negative values can significantly affect the accuracy of network traffic analysis, leading to inaccurate results and potentially misleading conclusions. NTLFlowLyzer provides a more reliable and accurate traffic analysis by addressing this issue.
Large Files Issues
NTLFlowLyzer was designed with advanced memory management techniques to handle large pcap files better, which often resulted in memory issues for CICFlowMeter. NTLFlowLyzer optimizes memory use by intelligently selecting the appropriate data structures and processing data in chunks. As a result, NTLFlowLyzer can process large pcap files with ease and without experiencing memory issues. This was achieved by implementing a different approach to packet analysis. Rather than loading the whole pcap file and all the packets into memory, NTLFlowLyzer analyzes packets one by one and saves the critical information of each packet. This reduces memory usage and allows NTLFlowLyzer to handle large files more efficiently.
Manual Labeling
In CICFlowMeter, users had to manually label flows, which was a time-consuming and tedious process. To address this issue, NTLFlowLyzer provides a configuration file that allows users to set various parameters, including flow labeling. This feature provides greater flexibility and customization for users and improves the overall user experience. By eliminating the need for manual labeling and allowing for customized flow labeling, NTLFlowLyzer streamlines the flow analysis process and saves users valuable time and effort.
Pcapng Extension
CICFlowMeter needed more support for the pcapng
extension, which was also not initially addressed in NTLFlowLyzer due to the absence of support for this extension in the DPKT
library utilized for reading pcap files. However, NTLFlowLyzer provides users with a simple solution for converting pcapng
files to pcap
files to enable analysis using the tool.
ARP Flows
In CICFlowMeter, ARP flows were sometimes misinterpreted as TCP flows, which could lead to inaccurate analysis results. This issue has been addressed in NTLFlowLyzer, where ARP flows are now correctly identified and processed separately from TCP flows. This improves the accuracy of the analysis and ensures that ARP flows are not incorrectly included in TCP flow statistics.
In addressing these issues, we added new features to NTLFlowLyzer and corrected the calculation of some of the previous features. A comparison of these features is presented in Table features.
NTLFlowLyzer leverages the Python programming language instead of Java, providing several advantages in this domain. Python's simpler and more concise syntax facilitates both writing and reading code, while its rich set of libraries and frameworks allows for the implementation of advanced features with ease. Moreover, Python is easier to use and learn than Java, which
We hope that NTLFlowLyzer meets your needs effectively, but we are always open to feedback and suggestions to make it better. If you have any more questions or need further assistance, please feel free to reach out. Your input is highly valued, and we appreciate your support.
Thank you very much for your earnest reply. @moein-shafi Now I know more about NTLFlowLyzer and CICFlowMeter.And because you're so serious about doing these things, I'm sure they'll get better. Have a nice weekend.
Thank you for your kind words and understanding @Yesisyes. I’m glad I could provide the information you needed. We are committed to continually improving NTLFlowLyzer, and your feedback is invaluable to that process.
Wishing you a wonderful weekend as well!
Hi, hope everyone is well. I ran the NTLFlowLyzer -c command to get a csv file. But I didn't get an analysis of the UDP protocol. Have you changed to only analyze TCP flows? But the output use case you gave has UDP protocol analysis. This makes me confused. I hope you can answer it. Thank you very much. Below is my running log.
(venv) ym@ydeMac-mini NTLFlowLyzer-master % NTLFlowLyzer -c /Users/ym/NTLFlowLyzer-master/NTLFlowLyzer/config.json You initiated NTLFlowLyzer!
Number and percentage of TCP packets: Total TCP packets: 277858 Percentage of TCP packets: 97.99%
Number and percentage of UDP packets: Total UDP packets: 5707 Percentage of UDP packets: 2.01%
Top 10 Application Layer Protocols: Port 56686 (Unknown): 22294 packets, 7.86% Port 54512 (Unknown): 11157 packets, 3.93% Port 53148 (Unknown): 9376 packets, 3.31% Port 47205 (Unknown): 5655 packets, 1.99% Port 56682 (Unknown): 5501 packets, 1.94% Port 443 (HTTPS): 5305 packets, 1.87% Port 56356 (Unknown): 4469 packets, 1.58% Port 53868 (Unknown): 4043 packets, 1.43% Port 3127 (Unknown): 3617 packets, 1.28% Port 52895 (Unknown): 3520 packets, 1.24%
##################################################