CSIRT-MU / Stream4Flow

A framework for the real-time network traffic analysis based on world-leading technologies for distributed stream processing, network traffic monitoring, and visualization.
https://csirt.muni.cz/?lang=en
MIT License
100 stars 36 forks source link
software

A framework for the real-time IP flow data analysis built on Apache Spark Streaming, a modern distributed stream processing system.

This project is no longer maintained

⚠ Project Stream4Flow is no longer maintained as the used frameworks are constantly evolving, and it is not in our capacity to continually update the installation scripts. If you're interested in other network data processing tools and our current research, check out CSIRT-MU repositories.

About Stream4Flow

The basis of the Stream4Flow framework is formed by the IPFIXCol collector, Kafka messaging system, Apache Spark, and Elastic Stack. IPFIXCol is able to receive IP flows from a majority of network Netflow/IPFIX probes (e.g., Flowmon Probe, softflowd, etc.). IPFIXCol enables incoming IP flow records to be transformed into the JSON format provided to the Kafka messaging system. The selection of Kafka was based on its scalability and partitioning possibilities, which provide sufficient data throughput. Apache Spark was selected as the data stream processing framework for its quick IP flow data throughput, available programming languages (Scala, Java, or Python) and MapReduce programming model. The analysis results are stored in Elastic Stack containing Logstash, Elasticsearch, and Kibana, which enable storage, querying, and visualizing the results. The Stream4Flow framework also contains the additional web interface to make administration easier and visualize complex results of the analysis.

architecture

Framework Features

Use-cases

More on stream-based IP flow analysis is described in our paper titled Toward Stream-Based IP Flow Analysis.

Getting started

We have it all prepared for you. Everything is preconfigured. You have to only choose the deployment variant.

Deployment

Default machine configuration

Default login credentials

Requirements

We support two types of deployment:

Standalone deployment

Note: The minimum hardware requirement is 12GB of RAM

  1. download repository
  2. go to folder provisioning/
  3. (optional) update guests configuration in configuration.yml
  4. run vagrant provisioning: vagrant up or start guests separately vagrant up <guest-name>
  5. upload your SSH key to guests or allow password based SSH login (use vagrant ssh <guest-name>)

See provision/README.md for additional information about provisioning and Vagrant usage.

Cluster deployment

Note: machines in the cluster must run Debian OS with systemd

  1. download repository
  2. go to folder provisioning/ansible
  3. supply your inventory file with you cluster deployment according to file inventory.ini.example
  4. run ansible ansible-playbook -i <your inventory file> site.yml --user <username> --ask-pass (consult ansible docs for further information)

Usage

Usage Description Usage information
Input data Input point for network monitoring data in IPFIX/Netflow format
  • producer IP address
  • default IP is 192.168.0.2
  • port UDP/4739
Stream4Flow Web Interface Web interface for application for viewing data
  • consumer IP address
  • default IP address is http://192.168.0.3/
  • default login:Stream4Flow
  • default password:Stream4Flow
Spark Web Interface Apache Spark streaming interface for application control
Kibana Web Interface Elastic Kibana web interface for Elasticsearch data

Run an example application protocols_statistics

  1. login to Spark Master machine via ssh ssh spark@192.168.0.100
  2. go to application directory cd /home/spark/applications/
  3. run example application ./run-application.sh ./statistics/protocols_statistics/spark/protocols_statistics.py -iz producer:2181 -it ipfix.entry -oz producer:9092 -ot results.output

Send data to Stream4Flow

Stream4Flow is compatible with any Netflow v5/9 or IPFIX network probe. To measure your first data for Stream4Flow, you can use either commercial solution such as Flowmon Probe or an open-source alternative softflowd

How to reference

Bibtex

@ARTICLE{jirsik-2017-toward, 
  author={Jirsik, Tomas and Cermak, Milan and Tovarnak, Daniel and Celeda, Pavel}, 
  journal={IEEE Communications Magazine}, 
  title={Toward Stream-Based IP Flow Analysis}, 
  year={2017}, 
  volume={55}, 
  number={7}, 
  pages={70-76}, 
  doi={10.1109/MCOM.2017.1600972}, 
  ISSN={0163-6804},
}

Plain text

T. Jirsik, M. Cermak, D. Tovarnak and P. Celeda, "Toward Stream-Based IP Flow Analysis," in IEEE Communications Magazine, vol. 55, no. 7, pp. 70-76, 2017.
doi: 10.1109/MCOM.2017.1600972

Related Publications

Acknowledgement

The SecurityCloud project is supported by the Technology Agency of the Czech Republic under No. TA04010062 Technology for processing and analysis of network data in big data concept.

Project partners

CESNET, z. s. p. o.

Flowmon Networks, a.s.