ahlashkari / ALFlowLyzer

GNU General Public License v3.0
4 stars 0 forks source link

Generated CSV file is empty #121

Closed Daumel closed 1 week ago

Daumel commented 1 week ago

Hi!

I would love to use your tool to extract DNS information from a PCAP file. Unfortunately, it doesn't work for me. I would be very thankful if you could tell me what the problem is.

I tried ALFlowLyzer with this PCAP file: https://github.com/ggyggy666/DNS-Tunnel-Datasets/blob/main/tunnel/dnscat2-cname.pcap

I used the following configuration file with the default parameters:

    "pcap_file_address": "./dnscat2-cname.pcap",
    "output_file_address": "./dataset/output-of-my_pcap_file.csv",
    "label": "Malicious",
    "number_of_threads": 4,
    "feature_extractor_min_flows": 2500,
    "writer_min_rows": 1000,
    "read_packets_count_value_log_info": 1000000,
    "check_flows_ending_min_flows": 20000,
    "capturer_updating_flows_min_value": 5000,
    "dns_activity_timeout": 30,
    "max_flow_duration": 120000,
    "floating_point_unit": ".4f",
    "max_rows_number": 800000,
    "features_ignore_list": [
        "dns_whois_domain_name",
        "dns_domain_email",
        "dns_domain_registrar",
        "dns_domain_creation_date",
        "dns_domain_expiration_date",
        "dns_domain_age",
        "dns_domain_country",
        "dns_domain_dnssec",
        "dns_domain_dnssec",
        "dns_domain_address",
        "dns_domain_city",
        "dns_domain_state",
        "dns_domain_zipcode",
        "dns_domain_name_servers",
        "dns_domain_updated_date"
    ]
}

If I run the tool, I get spammed with the message The DNS payload contains non bytes data. It is probably a malformed packet. and the output file is empty. Screenshot 2024-10-08 155413

moein-shafi commented 1 week ago

Hi @Daumel Thank you for trying out ALFlowLyzer!

The log message you're encountering, The DNS payload contains non-bytes data. It is probably a malformed packet, is expected behavior when processing certain packets in your PCAP file. This log entry is generated for each packet that has this issue, which indicates that your PCAP file contains several such packets.

Don’t worry—this message is just a log that tells you about these packets (it will show this per packet), and it's not an error. You can safely ignore it and allow the program to complete its execution. The output CSV file will still be generated at the end of the process, even if you see multiple log messages.

If you have any further questions or require assistance, please don't hesitate to reach out.

Daumel commented 1 week ago

First of all, thank you for the quick reply! Unfortunately, the generated CSV file is completely empty.

I also reviewed the PCAP file in Wireshark, and I can't identify any packets that seem malformed. The file contains only DNS packets, and none appear to be truncated.

If it's not too much trouble, could you please test ALFlowLyzer with this specific PCAP file? I would greatly appreciate it. If it works on your end, the issue might be related to the version of the dependencies or something else in my local setup.

moein-shafi commented 1 week ago

It's my pleasure to assist. I'll take a test of the PCAP file and get back to you by the end of the day or tomorrow morning at the latest (EST).

moein-shafi commented 1 week ago

Hi again @Daumel

I tested the file, and it’s working as expected. I’ve uploaded the output CSV here for reference: Output CSV. Please ensure you’re running the latest version of the package, as previous versions may have had issues that are now resolved.

For this test, I used the default configuration file, and it generated the output quite efficiently. My test environment was Windows, but the tool should work on Debian-based systems, and it’s likely compatible with other Linux distributions as well, given that it’s a Python package.

Could you let me know which operating system you’re using? That might help narrow down the cause. If possible, try running a clean installation to avoid any conflicts, as I can’t currently identify a specific reason for the zero-output issue you’re encountering.

Let me know how it goes, and feel free to reach out if the problem persists!

Daumel commented 1 week ago

Thank you for the information! I’ve identified the issue.

It seems that using the latest version of Scapy (2.6.0) results in an empty generated file. However, after downgrading to version 2.5.0, everything is working fine now.

Daumel commented 1 week ago

Unfortunately, I am still encountering issues when working with certain PCAP files. Here are the dependencies I currently have installed on my Windows 11:

python -> 3.13.0 (latest)
dpkt -> 1.9.8 (latest)
scipy -> 1.14.1 (latest)
scapy-> 2.5.0 (see previous comment)

While the PCAP file dnscat2-cname.pcap that I had issues with earlier is now being generated correctly, I am still facing problems with another file. Specifically, the PCAP file normal.pcap results in an empty CSV file when processed.

moein-shafi commented 1 week ago

Hi @Daumel,

Thanks for bringing this up—I dove into the issue, and it turns out there were a couple of underlying causes.

1. Timestamp Conversion: Initially, I ran into problems with timestamp conversion in this specific pcap. In previous cases, the conversion worked without a hitch, but with this file, I had to cast the timestamp to a float before passing it to the datetime object. Without that, it would freeze every time it tried to convert the timestamp, which prevented further packet reading.

2. Division by Zero in Rate Calculations: The main culprit for the empty CSV output, though, was the same issue you encountered. Occasionally, the time difference in rate-related features is zero, which led to division-by-zero errors. I’ve now added a quick check to avoid that, and it’s working smoothly.

For clarity, I also commented out the line generating excessive warnings—it was a bit overwhelming! In future updates, I plan to add a verbose logging level, so those warnings will be available in a high-verbosity mode. I also added logs (prints) around exception-handling and monitoring areas to make debugging a bit more straightforward.

These fixes are now included in the latest commit: 98605e6c05b2c3a24a701b392595ee48748aa23c.

Thanks again for your help improving ALFlowLyzer! The fix seems solid with the normal pcap, though it’s possible other file types could surface new quirks, which we’ll address step by step. Let me know if you run into anything else!

moein-shafi commented 1 week ago

BTW, here are the library versions currently in use on my system:

I appreciate your insight in resolving this matter!

Daumel commented 1 week ago

Hi again @moein-shafi,

thank you so much! I have been working all day with your tool, and the extraction is working fine. I will make sure to give ALFlowLyzer credit in my thesis.

I did a deep dive into all the features that ALFlowLyzer generates, and I came across two features that seem not to be functioning correctly. They are not critical for my thesis, but I wanted to bring them to your attention. If you happen to fix them, that would be great.

Screenshot 2024-10-10 162415

query_resource_record_type

I expect the value to be [1] since there is one question of type "A," but it is always [].

query_resource_record_class

I expect the value to be [1] as there is one question with the class "IN," but it is always [].

You can use the CSV file that you generated for dnscat2 as an example.

moein-shafi commented 1 week ago

Hi @Daumel

Thank you very much for your kind words and for taking the time to work so thoroughly with ALFlowLyzer. I’m glad to hear the tool has been helpful to you, and I truly appreciate your intention to include ALFlowLyzer in your thesis.

Thank you as well for pointing out the issues with query_resource_record_type and query_resource_record_class. I appreciate your keen eye on these details, and I will review these features and update the code accordingly to ensure they work as expected.

Since this addresses the current issue, I’ll close the issue. Please feel free to reach out again if any other questions or insights come up. Wishing you all the best with your thesis!