ahlashkari / NTLFlowLyzer

GNU General Public License v3.0
40 stars 10 forks source link

Broken pipe happened when converting pcaps #21

Closed MuMuuuu closed 3 months ago

MuMuuuu commented 4 months ago

When I am using NTLFlowLyzer to transfer CICIDS-2017 Thursday-WorkkingHours, I met a [Errno 32] Broken pipe exception, how could I solve that ?

image

I had converted format with tshark and use a python script to enum the features.name for my config json. This is my configuration. config2.json Also the detailed output. exception.txt

moein-shafi commented 4 months ago

Hi @MuMuuuu Thank you for reaching out and for your engagement with NTLFlowLyzer.

I'll look into this error over the weekend and get back to you. In the meantime, could you please provide:

  1. Your system specifications.
  2. Does this issue occur only with the specific pcap file you mentioned? (is it fine with other pcaps?)
  3. Have you tested on any other operating systems, preferably Debian-based ones?

Your input will help in diagnosing the issue more accurately.

Thanks!

MuMuuuu commented 4 months ago
  1. Probably the wrong cores I gave to my VM, but this exception still comes out when I give my VM 4 cores. image Detailed output : exception3.txt

  2. I tried a tiny pcap and works successfully, is there any timeout affect disconnect on pipeline haply ? tiny_80_output.txt

  3. I only have Kali on my own right now, WSL1 build error with the output ImportError: /usr/local/lib/python3.10/dist-packages/scipy-1.6.0-py3.10-linux-x86_64.egg/scipy/spatial/transform/rotation.cpython-310-x86_64-linux-gnu.so: undefined symbol: _PyGen_Send, which is a issue I believed is related to Cython. image

MuMuuuu commented 4 months ago

Run success after changing config to this

{
    "number_of_threads": 6,
    "feature_extractor_min_flows": 2500,
    "writer_min_rows": 1000,
    "read_packets_count_value_log_info": 500000,
    "check_flows_ending_min_flows": 20000,
    "capturer_updating_flows_min_value": 1000,
    "max_flow_duration": 120000,
    "activity_timeout": 12000,
    "batch_address": "",
    "batch_address_output": "",
    "floating_point_unit": ".3f",
    "max_rows_number": 800000
}

Probably need to mention that low threads may still leads to exception, or there's a potential bug.

moein-shafi commented 4 months ago

Hi @MuMuuuu,

Thank you for the detailed updates and for providing additional information. Based on my investigation and your comments, it seems the issue is related to resource allocation, particularly memory. Given the large size of the pcap file, a significant number of packet and flow instances are stored in memory. When the memory limit is reached, threads may fail to allocate additional memory, leading to corruption, thread closure, and ultimately, a broken pipe error.

Here are several solutions that could address the issue:

  1. Increase Resource Allocation: Allocate more RAM to your VM. This will provide more memory for storing packet and flow instances and reduce the likelihood of running out of memory.

  2. Adjust Configuration Values: Experiment with lower values for feature_extractor_min_flows, writer_min_rows, check_flows_ending_min_flows, and capturer_updating_flows_min_value. This may slow down the program but can prevent crashes.

  3. Utilize More Threads: As you have done, can help manage memory usage more efficiently. This works similarly to adjusting the configuration values, as multiple threads handle the minimum number of completed flows, finish their tasks, and release memory accordingly.

  4. Split Large pcap Files: Divide your large pcap files into smaller files. This can be done using tools like editcap from the Wireshark suite. Processing smaller pcap files individually will reduce memory usage and can prevent crashes (similar to the second option). Use the batch continuous option to process these smaller files sequentially without losing any data.

  5. Write to Disk: The safest and most robust solution is to write intermediate data to disk instead of keeping it in memory. This approach will require modifying the code to include disk I/O operations and possibly integrating a database to manage the data. While this can greatly improve stability when dealing with large pcaps, it requires significant changes to the codebase and may slow down the program as well.

I hope these suggestions help resolve your issue. Please feel free to reach out if you need any further assistance or have additional questions.

Thank you!

moein-shafi commented 3 months ago

Given this clarification, I'll proceed to close this particular issue. However, please don't hesitate to reconnect if you encounter any further difficulties or have additional inquiries. Your feedback is invaluable to us as we strive to maintain the integrity and functionality of NTLFlowLyzer.