CUMLSec / XDA

78 stars 13 forks source link

Bi-rnn's implementation #7

Open thaddywu opened 3 years ago

thaddywu commented 3 years ago

Hi, how's it going! So much thanks for your previous sincere and prompt reply to my last issue! I hope you enjoy your day~

Here's one thing I wanna ask. Due to your paper, you guys implemented Bi-RNN following the same setup described in usenix15. But the input is a 256-dimensional one-hot vector in their paper, while birnn.py take the single byte directly as the input to the ML model. Your paper reports that Bi-RNN fails to achieve high accuracy on SPEC datasets, accounting for <80% F1-score. I modified your code to the one-hot vector input version, but found the F1 score could quickly rise up to 90%+ after several epochs, inconsistent to your results.

I'm curious about how did you split train-test datasets and calculate its overlap rate? Though this part has been mentioned in your paper, could you make it clearer please? And I'm wondering on which setting your report of Bi-RNN is based, the single-byte input version, or the one-hot vector version. :)

Hopy you have a good weekend.

peikexin9 commented 3 years ago

Hi @thaddywu, Thanks for your interest! Could you share your code (including how you generate the data) so I can take a look? We can also have a meeting if you want :-)