arthurherbout / crypto_code_detection

Automatic Detection of Custom Cryptographic C Code
8 stars 4 forks source link

Test on data slice (Open-SSL only files) #37

Closed arnaudstiegler closed 4 years ago

arnaudstiegler commented 4 years ago

Test our model on files that belong to open-ssl (both cryptos and non-cryptos). The underlying idea of this experimentis to see whether the model has really learn to flag crypto files, or maybe learn to flag files from a crypto-library (which is definitely not the same)

arnaudstiegler commented 4 years ago

Experiment: train on all data except files from wolfssl, test on wolfssl files including both crypto and non crypto files. Imbalance is less important than in the training set (here we have 55 crypto files for around 230 files)

Number of non-crypto classified as crypto: 25 Number of crypto classified as non-crypto: 24

Test Accuracy: 0.78 Test F1-score: 0.30 Test F2-score: 0.28 Test Precision: 0.37 Test Recall: 0.27

Results aren't that good, you can ignore the non-crypto files that were detected as crypto (mostly bad labels, ie crypto files not labelled as crypto). However, results are pretty bad for crypto files: most of the files that are mislabelled are pretty plain crypto codes with very distinctive crypto operations. Still can't clearly understand why but there are basically two possibilities that I can think of:

I need to investigate more the results