AI-IDS / kdd99_feature_extractor

Utility for extraction of subset of KDD '99 features from realtime network traffic or .pcap file
MIT License
118 stars 50 forks source link

kdd99_feature_extractor

Utility for extraction of subset of KDD '99 features [1] from realtime network traffic or .pcap file This utility is a part of our project at University of Bergen.

Some feature might not be calculated exactly same way as in KDD, because there was no documentation explaining the details of KDD implementation found. Algorithms are based on some articles [2][3] and observation of values in KDD dataset.

Features in KDD should be the same as features introduced by Lee & Stolfo in their work [2].

Status

Features

Main components

  1. Sniffer
    • Network traffic sniffer & frame parser
  2. IP reassembler
    • Only IP header "summaries"
    • Payload not reassembled (content features not extracted, it is not needed)
  3. Connection/Conversation reconstructor
    • Reconstructs conversations
    • Computes intrinsic features (columns 1-9 of KDD)
  4. Statistical engine
    • Computes derived features (columns 23-41 of KDD)

Build instructions to Linux (tested on Ubuntu)

  1. Create a folder to temporal build files
    mkdir build-files

  2. Enter in the folder and compile the cache
    cd build-files
    cmake -DCMAKE_BUILD_TYPE=Debug -G "CodeBlocks - Unix Makefiles" ..

  3. Exit the folder of build cache and compile the project
    cd ..
    cmake --build ./build-files --target kdd99extractor -- -j 4

  4. Path to compiled project is:
    build-files/src/kdd99extractor

Planned sections in this readme

Main sources of feature documentation

[1] KDD Cup 1999 Data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

[2] Lee, W. & Stolfo, S. J. (2000), 'A framework for onstructing features and models for intrusion detection systems', Information and System Security 3 (4) , 227-261.

[3] Dybey, D. & Dubey, J. (2014), 'A Survey Intrusion Detection with KDD99 Cup Dataset', International Journal of Computer Science and Information Technology Research 2 (3), 146-157.