Yasir-ali-farrukh / Payload-Byte

Payload-Byte is a tool for extracting and labeling packet capture (Pcap) files of modern network intrusion detection datasets.
MIT License
27 stars 3 forks source link

Mapping payload data into preprocessed data #5

Closed ichsancomp closed 1 month ago

ichsancomp commented 1 month ago

Hello,

I appreciate all of your hard work. Your code worked for me when I followed it (Pipeline.ipynb), the output is UNSW_converted_data.csv. How we can process UNSW_converted_data.csv to be UNSW-NB15_preprocessed.csv ?

Many thanks for your assistance in advance.

Yasir-ali-farrukh commented 1 month ago

Thank you, The UNSW-NB15_preprocessed.csv file is the output of the initial step, which involves CSV_data_preprocessing. This step includes basic preprocessing tasks, such as addressing data discrepancies and formalizing the data before it is fed into the pipeline.

To proceed, you should first preprocess the UNSW data using the CSV_data_preprocessing and then apply the pipeline to generate the UNSW_converted_data.csv file

I hope this clarifies your question.

ichsancomp commented 1 month ago

Thank you for your response. According to CSV_data_preprocessing, the input is UNSW-NB15_{i}.csv, from where we can get that file? how we can mapping from Payload_data_UNSW.csv (containing 1500 features)?

Thank you.

Yasir-ali-farrukh commented 1 month ago

You can download the pcap as well as UNSW-NB15_{i}.csv from their official website: https://research.unsw.edu.au/projects/unsw-nb15-dataset.

UNSW_converted_data.csv is same as Payload_data_UNSW.csv. The file is being renamed.

ichsancomp commented 1 month ago

How we can process from pcap file to be UNSW-NB15_{i}.csv? I'm sorry for your inconvenience

Yasir-ali-farrukh commented 1 month ago

The dataset Owner's have created flow features (UNSW-NB15_{i}.csv) by using The Argus and Bro-IDS tools. The Payload-Byte tool is labelling the individual packet as per the generated flows, furthermore it is extracting the payload information from the packets to be utilized in training machine learning models.

ichsancomp commented 1 month ago

Got it, thank you for your explanation.

ichsancomp commented 1 month ago

The dataset Owner's have created flow features (UNSW-NB15_{i}.csv) by using The Argus and Bro-IDS tools. The Payload-Byte tool is labelling the individual packet as per the generated flows, furthermore it is extracting the payload information from the packets to be utilized in training machine learning models.

how we can capture the packet as well as UNSW-NB15 feature (45 features)?

Yasir-ali-farrukh commented 1 month ago

You will have to edit the code for that, so that it retain the 45 flow features too.