ViktorAxelsen / TFE-GNN

WWW'23 Research Paper: TFE-GNN: A Temporal Fusion Encoder Using Graph Neural Networks for Fine-grained Encrypted Traffic Classification
Apache License 2.0
48 stars 5 forks source link

Tor test_acc=0.65 #5

Open jingbobuchi opened 4 months ago

jingbobuchi commented 4 months ago

Hello, author. I ran your code on the Tor data set and found that the training acc was 0.98, but the test acc was only 0.65, indicating serious generalization. I would like to ask you to check the config about tfe-gnn. The config of the current version is consistent with the config of cle-gnn.

ViktorAxelsen commented 4 months ago

The two config files in TFE-GNN and CLE-TFE themselves are very similar, except for the modification of a few hyper-parameters.

When you use SplitCap to obtain bidirectional flows, there should be UDP and TCP .pcap files. We only use TCP files in our work. And we conducted all our experiments in the four sub-datasets independently, i.e., ISCX-VPN, ISCX-NonVPN, ISCX-Tor, and ISCX-NonTor. In addition, there may be some mismatches w.r.t. .pcap file categorization, we will update the details later in README.

jingbobuchi commented 4 months ago

Thank you for your reply. First, I downloaded the Tor dataset from http://205.174.165.80/CICDataset/ISCX-Tor-NonTor-2017/Dataset/, and there are some pcap files that I cannot determine the classification they belong to. Can you show the classification of the data set in more detail? I will be very grateful. Second, I used splitcap.exe and python to obtain the bidirectional stream, but TCP and UDP were not separated. Can you update your code for getting bidirectional streams? Thank you for your help.

ViktorAxelsen commented 4 months ago

The category classification details will be updated later. The TCP and UDP files are indicated in the file name itself, such as "MAIL_gate_POP_filetransfer.pcap.TCP_10-0-2-15_59960_198-52-200-39443.pcap". You can just use regex (e.g., mv ./*.TCP* ./TCP) to separate files.

jingbobuchi commented 4 months ago

Hello, author. I reran the code according to your updated classification of the ISCX-Tor dataset. All parameters are set according to your initial config. The results are shown in the figure. I don't know what went wrong. I need your help.Thanks. PixPin_2024-03-02_10-26-15

ViktorAxelsen commented 4 months ago

Have you tried experiments on other datasets?

jingbobuchi commented 4 months ago

I haven't experimented with it in other datasets yet. Because I found that in your paper TFE-GNN performed best on the Tor dataset, I wanted to call up the best performance on the TOR dataset. At the moment I don't know what went wrong. I was lost. Can the config in the current repository achieve the performance in the paper?Thanks.

jingbobuchi commented 4 months ago

Thank you for your reply. I'm very sorry. What is the operating system model of the machine you are conducting experiments on? I'll be re-running the code on the same system as yours. In addition, could you please open source the header_test.npz, header_train.npz, test.npz, and train.npz files on Tor dataset? I want to figure out what exactly is going wrong. Thank you.

ViktorAxelsen commented 4 months ago

Well, I have tried experiments on the ISCX-Tor dataset again, and the results are consistent with those in the paper. Considering that the dataset partition may differ (the default order in which files are processed may vary from machine to machine), I have tried shuffling the data samples before the 9: 1 partition. The results are also consistent with previous ones (There is only a fluctuation of about 1~2% w.r.t. several metrics).

Here are some suggestions you may refer to.

  1. Check if your environment settings are consistent with those in README.
  2. Try experiments on other datasets to check if the issue still exists.
  3. Try to adjust hyper-parameters in your environments.

Machine Env Unbuntu 20 LTS NV Driver Version: 470 CUDA: 11.3 Python Lib Version: please refer to the Env section in README

This is a temporary link to the NPZ file that you can use for debugging https://drive.google.com/drive/folders/1b4PijUr-coM8M8nzVi0_HQ0UjHQDmV8x?usp=sharing

jingbobuchi commented 4 months ago

Let me tell you good news. I used the header_test.npz, header_train.npz, test.npz and train.npz files you provided to achieve the accuracy in your paper. So I think it has little to do with the version. Subsequently, I sorted the files of each category strictly according to the order in the classification txt you provided, and regenerated files such as header_test.npz, header_train.npz, test.npz and train.npz. After training, the accuracy of the test set is 0.75, which is improved compared to the former. So I rechecked the paper. I found that I did not perform the following operations in the original article. “Additionally, as for each rest sample of datasets, we remove bad packets and retransmission packets within. For each packet in a fow or segment, we frst remove the ones without payload. Then we remove the Ethernet header, which only provides some irrelevant information for classifcation. The source and destination IP addresses, and the port numbers are all removed for the purpose of eliminating interference with sensitive information deriving from these IP addresses and port numbers.” I didn't find the relevant code. How did you delete the bad packets and retransmitted packets in the sample, and how did you remove the Ethernet header (IP anonymization). Thanks.

ViktorAxelsen commented 4 months ago

These necessary preprocessing operations are already included in the code (e.g., the remove() function in utils.py is used to remove the IP address...). Removing bad packets and retransmitted packets is not designed for the ISCX dataset, which you can just ignore. Can you provide information on how you use SplitCap?

Since you can reproduce the results using the files I provided, it means that the training and model code should be fine. So, I checked the preprocessing code and still didn't find any problems. Next, I checked the official website of the ISCX dataset. I wonder if you have noticed that the latest modification date of the data files they provided was last month. I have absolutely no idea what modifications they made to the data files, but I highly suspect this is the cause of the mismatch in the reproduction results. Can you check the modification date of the original pcap file of the Tor dataset you downloaded?

jingbobuchi commented 4 months ago

split-by-sessions.zip Thank you for your reply. First, below is my code for splitting using splitcap. Secondly, I did not find the original modification date in ubuntu. The time I downloaded is 2024.2.27 9:54. When I downloaded it, the website showed that the file was last modified on 2024.2.1 . If possible, you can upload the Tor-nonTor and vpn-nonvpn datasets you downloaded to the cloud disk. This will help us find the problem. Thank you. PixPin_2024-03-02_23-53-01

ViktorAxelsen commented 4 months ago

It seems that your SplitCap usage is fine. The dataset is too large to upload. You can try to use Xftp (or other commands in Linux...) to check the modification date. It's mine: cfd5e04d587a9d3467733b2e313bc4d2

jingbobuchi commented 4 months ago

I looked up the creation time of all the files. It is consistent with the picture you provided. 0 1

ViktorAxelsen commented 4 months ago

Well, #6 shows that the results are ok on the ISCX-VPN dataset, I suggest you experiment on the ISCX-VPN dataset first. If the results are also unreasonable, the problem lies in your preprocessing operation. If the result is normal, I suggest you try to change the partitioning of the ISCX-Tor dataset using the shuffle() function with different seeds (for example, Ori: [Train: 0-8, Test: 9], Now: [Train: 1-9, Test: 0]), and then rerun the experiment to see any changes.

Martin-share commented 4 months ago

Well, #6 shows that the results are ok on the ISCX-VPN dataset, I suggest you experiment on the ISCX-VPN dataset first. If the results are also unreasonable, the problem lies in your preprocessing operation. If the result is normal, I suggest you try to change the partitioning of the ISCX-Tor dataset using the shuffle() function with different seeds (for example, Ori: [Train: 0-8, Test: 9], Now: [Train: 1-9, Test: 0]), and then rerun the experiment to see any changes.

"respectively. We use SplitCap to obtain bidirectional fows from public datasets. Specially, due to the scarcity of fows in the ISCX-Tor dataset, we increase the training samples by dividing each fow into 60-second non-overlapping blocks in our experiments [27]. Finally, we utilize stratifed sampling to sequentially" is mentioned in the paper. So this may be the reason why there is a significant difference in the test results between TOR and VPN. What specific tool is used to process TOR data?

jingbobuchi commented 4 months ago

image

好吧,#6表明在 ISCX-VPN 数据集上结果没问题,我建议你先在 ISCX-VPN 数据集上进行实验。如果结果也不合理,问题就出在你的预处理操作上。如果结果正常,我建议您尝试使用不同种子的 shuffle() 函数来更改 ISCX-Tor 数据集的分区(例如,Ori: [Train: 0-8, Test: 9], Now: [训练:1-9,测试:0]),然后重新运行实验以查看是否有任何变化。

“分别。我们使用SplitCap从公共数据集中获取双向流。特别地,由于ISCX-Tor数据集中流的稀缺性,我们在实验中通过将每个流划分为60秒不重叠的块来增加训练样本[ 27]。最后,我们利用分层抽样来“顺序”,论文中提到。所以这可能就是TOR和VPN的测试结果存在显着差异的原因。使用什么具体工具来处理 TOR 数据?

Martin-share commented 4 months ago

image

好吧,#6表明在 ISCX-VPN 数据集上结果没问题,我建议你先在 ISCX-VPN 数据集上进行实验。如果结果也不合理,问题就出在你的预处理操作上。如果结果正常,我建议您尝试使用不同种子的 shuffle() 函数来更改 ISCX-Tor 数据集的分区(例如,Ori: [Train: 0-8, Test: 9], Now: [训练:1-9,测试:0]),然后重新运行实验以查看是否有任何变化。

“分别。我们使用SplitCap从公共数据集中获取双向流。特别地,由于ISCX-Tor数据集中流的稀缺性,我们在实验中通过将每个流划分为60秒不重叠的块来增加训练样本[ 27]。最后,我们利用分层抽样来“顺序”,论文中提到。所以这可能就是TOR和VPN的测试结果存在显着差异的原因。使用什么具体工具来处理 TOR 数据?

你好,请问你复现成功iscx-tor了吗

jingbobuchi commented 4 months ago

image

好吧,#6表明在 ISCX-VPN 数据集上结果没问题,我建议你先在 ISCX-VPN 数据集上进行实验。如果结果也不合理,问题就出在你的预处理操作上。如果结果正常,我建议您尝试使用不同种子的 shuffle() 函数来更改 ISCX-Tor 数据集的分区(例如,Ori: [Train: 0-8, Test: 9], Now: [训练:1-9,测试:0]),然后重新运行实验以查看是否有任何变化。

“分别。我们使用SplitCap从公共数据集中获取双向流。特别地,由于ISCX-Tor数据集中流的稀缺性,我们在实验中通过将每个流划分为60秒不重叠的块来增加训练样本[ 27]。最后,我们利用分层抽样来“顺序”,论文中提到。所以这可能就是TOR和VPN的测试结果存在显着差异的原因。使用什么具体工具来处理 TOR 数据?

你好,请问你复现成功iscx-tor了吗 Yes, the model I ran differs from the one in the paper by 1-2%.

Martin-share commented 4 months ago

image

好吧,#6表明在 ISCX-VPN 数据集上结果没问题,我建议你先在 ISCX-VPN 数据集上进行实验。如果结果也不合理,问题就出在你的预处理操作上。如果结果正常,我建议您尝试使用不同种子的 shuffle() 函数来更改 ISCX-Tor 数据集的分区(例如,Ori: [Train: 0-8, Test: 9], Now: [训练:1-9,测试:0]),然后重新运行实验以查看是否有任何变化。

“分别。我们使用SplitCap从公共数据集中获取双向流。特别地,由于ISCX-Tor数据集中流的稀缺性,我们在实验中通过将每个流划分为60秒不重叠的块来增加训练样本[ 27]。最后,我们利用分层抽样来“顺序”,论文中提到。所以这可能就是TOR和VPN的测试结果存在显着差异的原因。使用什么具体工具来处理 TOR 数据?

你好,请问你复现成功iscx-tor了吗 Yes, the model I ran differs from the one in the paper by 1-2%.

Are you generating your own NPZ model or was it provided by the author? If it's a self generated NPZ model, what did you do to improve its accuracy? Thank you

jingbobuchi commented 4 months ago

Hello author. May I ask how much memory the machine used when training the NonTOR data set? The memory of my machine is 16G, and with the virtual memory of 64G set, it still cannot be trained. All parameters are the same as what you set.

ViktorAxelsen commented 4 months ago

You can try to reduce NUM_WORKERS.

jingbobuchi commented 2 months ago

Hello, author. Your code does not implement deletion of Ethernet headers and IP anonymization. Is it because the above operations have little impact on the final result that you do not implement them? Looking forward to your reply.

ViktorAxelsen commented 2 months ago

I have implemented the deletion of IP anonymization for some datasets (the Ethernet headers have been already removed in the original pcap files in these datasets). While there are exceptions that I accidentally overlooked for some other datasets, which causes potential bugs.

Well, it's hard to say what impact anonymization operation will have on the model performance. However, according to my observations during the experimental phase, anonymization operation will improve the model performance (e.g., generalizability) to a certain extent on certain datasets (because anonymization operation can force the model to learn those ”causal representations“ while preventing overfitting)

jingbobuchi commented 2 months ago

Hello, dear author, in your ablation experiment, you used transformer instead of LSTM. I searched for relevant papers and found no content about using transformer to fuse features. I guess you are using the self-attention mechanism in the transformer for fusion. Can you provide some ideas or related papers? Looking forward to your reply. Thanks.

ViktorAxelsen commented 2 months ago

As a simple test, you can just use the transformer module integrated in PyTorch with the positional encoding.