Korving-F / DACA

DAtaset Creation Aquisition engine
MIT License
4 stars 0 forks source link

DACA

License MIT Pipenv GitHub release (latest by date) ICCWS Paper

Table of Contents

Overview

DACA is a configurable, automated testbed focused on spinning up servers/services, attacking them, recording log/network data and extracting any desired artifacts. The collected artifacts can then be used to tune detection rules or for educational purposes.

This project was created as part of Master thesis work at TalTech.

Requirements

This project requires pipenv to install it's main dependencies:

# Install pipenv
pip3 install pipenv

# Install project's dependencies
git clone git@github.com:Korving-F/DACA.git
cd DACA
pipenv install

The vagrant-scp module is used to collect data from VMs and needs to be installed as well:

vagrant plugin install vagrant-vbguest
vagrant plugin install vagrant-scp

In addition one needs a local installation of Vagrant as well as one of it's providers to actually run the VMs: VirtualBox or VMware Fusion / VMWare Workstation.

If you want to make use of the Apache Kafka or Elasticsearch outputs you need to install these solutions yourself and make them reachable over the network. By default no authentication is configured and all communications are done over plain-text protocols. To change this one has to update the filebeat ansible playbook.

Usage

pipenv shell                                       # Enter virtualenv
python3 daca.py --help                             # Show supported commands
python3 daca.py info --list                        # List available scenarios
python3 daca.py run -d data/ --id 2                # Run scenario 2 and store collected data in ./data/ directory
python3 daca.py run -d data/ --id 2 --interactive  # Run scenario 2 interactively instead of in automated mode
python3 daca.py --debug run -d data/ --id 2        # Run the scenaio in debug mode.

Writing Scenarios

Out-of-the-box scenarios are listed within the ./scenarios directory and can be used as a reference. Scenario files are found when they have the same name as their scenario directory.

Scenarios consist of components, the simplest type of Scenario has only a single component. Each component has 2 main sections:

  1. setup: this contains installation / provisioning steps.
  2. run: this contains runtime commands.

The Setup phase builds/snapshots the VMs or Docker containers, initializing the Scenario. The run section is evaluated on Scenario execution.

In the background network traffic can be captured and logs can be streamed to a Kafka broker or Elasticsearch cluster. Raw logs or other artifacts can be gathered after-the-fact as well.

Scenario files are interpreted as Jinja2 files which allow for

Simple, single-file scenario

This scenario sets up a vulnerable webapp and runs some nmap scans against it. It collects a tcpdump, raw log file and a asciinema terminal session recording as artifacts. The tool also writes all needed files to reproduce the scenario to a dedicated directory. See also here for the output of this particular example.

```yaml # simple_scenario.yaml name: "Simple example Scenario" description: | "This Scenario sets up a vulnerable web application and runs multiple NMAP scans against it." provisioner: vagrant use_default_templates: True components: - name: main_server description: Main Ubuntu machine used in this example scenario image: ubuntu/focal64 setup: type: shell val: > echo "[+] Installing dependencies"; sudo apt-get update; sudo apt install -y python2.7 unzip nmap asciinema; echo "[+] Installing Vulnerable Web App Gruyère"; wget http://google-gruyere.appspot.com/gruyere-code.zip -O /tmp/gruyere-code.zip; unzip /tmp/gruyere-code.zip -d /opt/gruyere-code; # Notice the Jinja2 template variable run: type: shell val: > echo "[+] Run webserver"; set -x; sudo python2.7 /opt/gruyere-code/gruyere.py > /tmp/gruyere.log 2>&1 & sleep 1; "{{ variables.nmap }}"; artifacts_to_collect: - type: pcap val: ["tcpdump -i any -n -t -w /tmp/web.pcap port 8008"] - type: files val: ["/tmp/gruyere.log", "/tmp/*.cast", "/tmp/*.pcap"] - type: cli_recording val: ["/tmp/nmap.cast"] # These entries are substituted for the Jinja2 tempate variable in the run section. variables: - nmap: - nmap -sV -p 8008 --script=http-enum 127.0.0.1 - nmap -p8008 --script http-waf-detect 127.0.0.1 - nmap -p8008 --script http-wordpress-users 127.0.0.1 ```

Multi-component scenario

scenarios/
├── web_attack_scenario           # Multi-component scenario.
│   ├── web_attack_scenario.yaml  # Main scenario file (same name as parent directory)
│   ├── component_webserver       # First webserver component with 2 instances.
│   │   ├── httpd.yaml            # 
│   │   ├── httpd_playbook        # Anisble playbook to provision httpd
│   │   ├── nginx.yaml            # 
│   │   └── nginx.bash            # Script to provision nginx
│   └── component_scanner         # Second component with two scanner subcomponents.
│       ├── nmap.yaml             # 
│       └── wpscan.yaml           # 
└── simple_scenario               # Single component scenario contained in a single file. 
    └── simple_scenario.yaml      # See below for a working example.

Architecture

Data Sets

  1. DNS Tunnelling Dataset - Investigates multiple popular DNS servers and publicly available DNS Tunnel utilities.
  2. DNS Tunnelling over DoH Dataset - Expands on the first data set by investigating utilities that can tunnel using DNS-over-HTTPS.

Future Development

  1. Many scenarios lend themselves to also be run on Docker (faster than current VM-based approach) while new scenarios could also be written for the cloud through Terraform (AWS, Google, Azure) which would allow generation/collection of cloud-native datasets.
    Docker (#24) Terraform (#11)

  2. Currently the local Anisble provisioner is used to initialize VMs, which installs / runs Ansible from within the VM. However ideally an installation on the Host is used. Ansible (#31)

  3. Currently all components are assumed to be running on Linux, which should be expanded with Windows (#32)

Run Tests

# Install pytest and other packages helpful for debugging
pipenv install --dev

# Run any tests
python -m pytest

License

DACA is licensed under the MIT license.
Copyright © 2022, Frank Korving