AI-IDS / kdd99_feature_extractor

Utility for extraction of subset of KDD '99 features from realtime network traffic or .pcap file
MIT License
119 stars 50 forks source link

A question about serror #2

Closed FlyingOnion closed 4 years ago

FlyingOnion commented 7 years ago

From Conversation.cpp, bool Conversation::is_serror() const Why do S2, and S3 belong to serror? Doesn't serror mean that an error occurs in the SYN, SYN-ACK, ACK handshake?

bittomix commented 7 years ago

Hi FlyingOnion,

S1,S2,S3 and S4 all cause Conversation::is_serror() to return true in this implementation. That's the desired behavior. I will try to describe the reason.

During initial design of this tool I searched for some documentation of the KDD '99 dataset to make the method of feature extraction as much identical to original KDD '99 as possible.

I found most of the information about feature extraction in paper from Dybey & Dubey (reference 3 in readme). Due to lack of aditional information I decided to trust the definition serror stated in this paper (table 4.3.1, page 152):

the percentage of connections that have activated the flag (4) s0, s1, s2 or s3... So besides the 3-way initial TCP handshake it also includes the final 4-way TCP handshake.

In another paper Wenke & Stolfo stated, that they used Bro packed filtering and resembling engine (reference 2 in readme). Then I found some info about BRO state machine here - search for field conn_state. The implemented state machine in this tool is based on these findings.

To make understading of implemented TCP state machine easier I digitalized an state machine I sketched during the design phase - see d67402d. It is slightly modified version of what I found out about BRO but I'm not sure if I it's completely up-to-date with the implementation. The state machine implementation can be found TcpConnection::update_state() and the states are listed and commented in enum conversation_state_t.

Hmm, looking at it now, another issue arises for me: should states S2F, S3F and maybe event ESTAB also be included in serror? (these states are not present in BRO and serror would mean someting like abnormal termination by timeout if they were included).

I hope this answers your question.