ZGC-LLM-Safety / TrafficLLM

The repository of TrafficLLM, a universal LLM adaptation framework to learn robust traffic representation for all open-sourced LLM in real-world scenarios and enhance the generalization across diverse traffic analysis tasks.
121 stars 16 forks source link

NotADirectoryError: [Errno 20] Not a directory: '../datasets/raw_data/ustc-tfc-2016/ustc-tfc-2016_detection_packet_test.json' #3

Open ReamonYim opened 2 months ago

ReamonYim commented 2 months ago

Dear author,

I encountered an issue when running the command:

python preprocess_dataset.py --input /Your/Raw/Dataset/Path --dataset_name /Your/Raw/Dataset/Name --traffic_task detection --granularity packet-level --output_path /Your/Output/Dataset/Path --output_name /Your/Output/Dataset/Name

The error is:

NotADirectoryError: [Errno 20] Not a directory: '../datasets/raw_data/ustc-tfc-2016/ustc-tfc-2016_detection_packet_test.json'

I downloaded the ustc-tfc-2016 files from training datasets. Could you please confirm whether the files from this link are already preprocessed or if they are the raw files needed for running the script?

Thank you!

CuiTianyu961030 commented 2 months ago

The training datasets are already preprocessed and can be directly used to train LLMs in step 2.4 and 2.5. The preprocess codes only work for extracting training data from raw traffic (i.e., .pcap files). If you want to reproduce the process of extracting the training data from the raw dataset of USTC TFC 2016, please download the raw dataset using its released link.

I hope this reply can help you.

ReamonYim commented 2 months ago

The training datasets are already preprocessed and can be directly used to train LLMs in step 2.4 and 2.5. The preprocess codes only work for extracting training data from raw traffic (i.e., .pcap files). If you want to reproduce the process of extracting the training data from the raw dataset of USTC TFC 2016, please download the raw dataset using its released link.

I hope this reply can help you.

thank you very much