echowei / DeepTraffic

Deep Learning models for network traffic classification
Mozilla Public License 2.0
664 stars 299 forks source link

关于训练准确率极低的情况 #20

Closed sakurachiyuki closed 2 years ago

sakurachiyuki commented 3 years ago

20类训练测试的输出如下

step 0, train accuracy 0
step 2000, train accuracy 0.9
step 4000, train accuracy 0.96
step 6000, train accuracy 0.94
step 8000, train accuracy 0.92
step 10000, train accuracy 0.98

2021-01-05 01:03:39
DATA_DIR: /PUBLIC/sakura/self_secuity/echowei/reproduction2/USTC-TK2016-ubuntu/5_Mnist
0, aimchat, 0.07194244604316546, 0.01020408163265306
1, AIM_Chat, 0.0, 0.0
2, browsing, 0.5, 0.003875968992248062
3, browsing2-1, 0.020446096654275093, 0.01089108910891089
4, browsing2-2, 0.12269129287598944, 0.09470468431771895
5, browsing2, 0.03184713375796178, 0.07286995515695067
6, browsing_ara, 0.0, 0.0
7, browsing_ara2, 0.0, 0.0
8, browsing_ger, 0.07142857142857142, 0.001026694045174538
9, Email_IMAP_filetransfer, 0.0, 0.0
10, AUDIO_spotifygateway, 0.0, 0.0
11, AUDIO_tor_spotify, 0.0, 0.0
12, AUDIO_tor_spotify2, 0.0, 0.0
13, BROWSING_gate_SSL_Browsing, 0.0, 0.0
14, BROWSING_ssl_browsing_gateway, 0.0, 0.0
15, BROWSING_tor_browsing_ara, 0.0, 0.0
16, BROWSING_tor_browsing_ger, 0.0, 0.0
17, BROWSING_tor_browsing_mam, -1, 0.0
18, BROWSING_tor_browsing_mam2, 0.0, 0.0
19, CHAT_aimchatgateway, 0.0, 0.0
Total accuracy: 0.0184

环境是Ubuntu18.04,所以用的是Ubuntu的分支处理 我的数据处理流程如下:

1.  pwsh 1_Pcap2Session.ps1 -f
2.  pwsh 2_ProcessSession.ps1 -a -s
3.  python 3_Session2Png.py 
4.  python 4_Png2Mnist.py

请问是我哪一步做错了吗? 感谢您的开源,期待您的回复!

sakurachiyuki commented 3 years ago

BTW,用您给的20类数据则不会有准确率低的情况,故此推测是数据处理流程有误、

step 0, train accuracy 0.08
step 2000, train accuracy 0.76
step 4000, train accuracy 0.84
step 6000, train accuracy 0.84
step 8000, train accuracy 0.94
step 10000, train accuracy 0.86

2021-01-05 14:23:09
DATA_DIR: /PUBLIC/sakura/echowei-deeptraffic/sakurachiyuki-DeepTraffic-master/DeepTraffic/1.malware_traffic_classification/3.PreprocessedResults/20class/SessionL7
0, aimchat, 0.7780487804878049, 0.42533333333333334
1, AIM_Chat, 0.9983333333333333, 0.9983333333333333
2, browsing, 0.9904306220095693, 0.9825949367088608
3, browsing2-1, 0.7841726618705036, 0.21330724070450097
4, browsing2-2, 0.9871428571428571, 0.9829302987197724
5, browsing2, 0.4538152610441767, 0.9064171122994652
6, browsing_ara, 0.9983606557377049, 1.0
7, browsing_ara2, 0.9665178571428571, 0.7959558823529411
8, browsing_ger, 0.8113207547169812, 0.9409190371991247
9, Email_IMAP_filetransfer, 0.9986824769433466, 0.9986824769433466
10, AUDIO_spotifygateway, 0.9939393939393939, 1.0
11, AUDIO_tor_spotify, 0.9931506849315068, 0.866965620328849
12, AUDIO_tor_spotify2, 0.7891472868217054, 0.8554621848739495
13, BROWSING_gate_SSL_Browsing, 0.9551656920077972, 0.98989898989899
14, BROWSING_ssl_browsing_gateway, 0.6893004115226338, 0.7957244655581948
15, BROWSING_tor_browsing_ara, 0.9786476868327402, 0.912106135986733
16, BROWSING_tor_browsing_ger, 0.9763860369609856, 0.9926931106471816
17, BROWSING_tor_browsing_mam, 0.9905992949471211, 0.991764705882353
18, BROWSING_tor_browsing_mam2, 0.7302752293577982, 0.6482084690553745
19, CHAT_aimchatgateway, 0.9832402234636871, 0.9328621908127208
Total accuracy: 0.8694934
sakurachiyuki commented 3 years ago

还有就是分类标签是我后来改了的,不过应该是不影响训练测试运行的

dict_20class = {0:'aimchat',1:'AIM_Chat',2:'browsing',3:'browsing2-1',4:'browsing2-2',
                       5:'browsing2',6:'browsing_ara',7:'browsing_ara2',8:'browsing_ger',9:'Email_IMAP_filetransfer',
                   10:'AUDIO_spotifygateway',11:'AUDIO_tor_spotify',12:'AUDIO_tor_spotify2',13:'BROWSING_gate_SSL_Browsing',
                    14:'BROWSING_ssl_browsing_gateway',15:'BROWSING_tor_browsing_ara',16:'BROWSING_tor_browsing_ger',
                    17:'BROWSING_tor_browsing_mam',18:'BROWSING_tor_browsing_mam2',19:'CHAT_aimchatgateway'

               }
shathil commented 3 years ago

@sakurachiyuki, I am having issues with the input_data on OSX. how did you solve that problem?

jeevan-thapa commented 3 years ago

@sakurachiyuki, I am having issues with the input_data on OSX. how did you solve that problem?

Is it saying that there's no input_data or so? Can you elaborate. I got into a similar issue and found out that the file paths were not correct. You may need to check on that.

shathil commented 3 years ago

@jeevan-thapa thanks. I was having issues with the input_data.py. The initial error was like import not found. Then I downloaded it from somewhere. Now that error is gone. Now I am having issues like the following

labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1 IndexError: index 32440 is out of bounds for size 32440

There are also other deprecated API issues as my python version is 3.7. Can you please share your experience and configurations at mohammad.a.hoque@helsinki.fi?