cekbote / coredns_ml_plugin

This repository is the result of my work on a project during GSoC 2020. The goal of the project was introducing the integration of machine learning capabilities with the CoreDNS server.
MIT License
11 stars 1 forks source link

CoreDNS Machine Learning Plugin

License GitHub forks

This repository contains code for connecting the CoreDNS server to a Machine Learning Environment for DNS request and response analysis, monitoring and alerting.

Overview

This repository is the result of my work on a project during GSoC 2020. The goal of the project was introducing the integration of machine learning capabilities with the CoreDNS server.

The initial use case was the identification of the DNS requests ,via machine learning, for the records of the domains that could be used by malicious hackers and other computer criminals. Upon the identification of the requests the plugin would either alert the sysadmin for manual vetting or block the requests and responses.

Approach

General Overview

Currently, Golang doesn't have native libraries for the interaction with the CUDA platform. At the same time, the Python ecosystem has tools like TensorFlow, PyTorch, MXNet and various others that not only interact with the CUDA platform but also allows for the easy prototyping and evaluation of deep learning models.

This project combines the deep learning capabilities that the Python ecosystem provides, with CoreDNS, by creating:

image info

Machine Learning

Learning Dataset

The deep-learning model is trained on a COVID-19 Cyber Threat Coalition Blacklist for malicious domains that can be found here and on a list of benign domains from DomCop that can be found here.

Currently, the pre-trained model has been trained on the top 500 domain names from both these datasets. The final version of the pre-trained model will be trained on the entirety of both the datasets.

Learning Process

Data Preprocessing: Each domain name is converted into a unicode code point representation and then extended to a numpy array of a length 256. The dataset was created by combining the malicious domains as well as the non-malicious. The dataset was split as follows:

Training: The deep-learning model is a Convolutional Neural Net that is trained using batch gradient descent with the Adam optimizer.

Inner Working of the Application

image info

Machine Learning Plugin

The machine learning plugin forwards a request to the Flask Server for domain name analysis. The Flask Server then processes the request and sends back whether the domain name is malicious or benign. If the domain name is malicious, the plugin prevents the fallthrough to other plugins and sends back a Honeypot or Blackhole IP.

Flask Server

The Flask Server first preprocessess the request forwarded from the Machine Learning Plugin. The preprocesssed request is then sent to the machine learning model where it infers whether it is benign or malicious.

If the model is highly confident that the domain name is benign, a response is sent back to the Machine Learning Plugin that allows the fallthrough to other plugins.

If the model is highly confident that the domain name is malicious, a response is sent back to the Machine Learning Plugin that prevents the fallthrough to other plugins. Moreover, the Machine Learning Plugin sends back a Honeypot or a Blackhole IP to the user querying the malicious domain.

If the model is not very confident about the its inference, then the the manually vetted lists are looked at. If the domain name exists in the benign domain list, then the same procedure is followed as described above for benign domains. Similarly, if a malicious domain exists in the malicious domain list, then the same procedure is followed as described above for malicious domains. If the domain name is not present in any of the lists, then the same procedure is followed as described above for benign domains, however, these domains are stored in the database for manual vetting.

In all the three above scenarios, the results as well as other metadata are stored in the database.

Dash Application

The Dash Application has two main use cases:

Implementation

Machine Learning Plugin and CoreDNS Build

The machine learning plugin is a CoreDNS plugin that forwards requests to the Flask Server via HTTP POST requests. Once the Flask server processes the request, it sends the prediction, whether the domain name is malicious or benign, back to the plugin. Depending on the nature of the domain name, the plugin can be configured to allow the request to fall through to the other plugins or send the request to a honeypot or a blackhole.

To install and start CoreDNS please take a look at the CoreDNS repository. To add external plugins, please take a look at the example plugin.

To add the plugin to a particular port say 1053, please make the changes to the Corefile as shown below:

.:1053 {
    mlplugin
}

Application Middleware with Flask

The middleware is a Python Flask Server that contains the pre-trained Convolutional Neural Network. The Flask Server receives the domain name queried as well as the IP address of the machine used to query that particular domain name, as a JSON message, via HTTP POST requests from the plugin.

Once the Flask Server receives the domain name and the IP address, the domain name is preprocessed and then passed to the pre-trained deep learning model. The deep learning model then classifies whether the domain name is of a malicious website or not and then sends the same back to plugin as a JSON message.

The classification result as well as other metadata such as the IP address, the date and time of the request are stored in a NoSQL database, namely Elasticsearch, due to which storing and querying the classification result and the metadata is a fast process.

Before running the Flask Server, it is recommended that the Elasticsearch server is running in the background. To install Elasticsearch, please follow the instructions found on this page. Once Elasticsearch is installed, cd into it and enter bin/elasticsearch to run the Elasticsearch server.

To run the Flask Server, cd into the flask_server directory and then enter python dns_monitoring_server.py in the command line.


TensorFlow Model

Tensorflow Model Definition

The pre-trained deep learning model is a Convolutional Neural Net whose input is a (16, 16, 1) shaped array and the output is a single value lying in between 0 and 1. If the output value is less than 0.5 the domain name is considered benign , else it is considered malicious.

The model summary can be found below:

Layer Output Shape Activation Number of Parameters
Input (None, 16, 16, 1 ) - 0
Conv2D (None, 15, 15, 16) Relu 80
MaxPooling (None, 7, 7, 16) - 0
Conv2D (None, 6, 6, 16) Relu 1040
MaxPooling (None, 3, 3, 16) - 0
Conv2D (None, 2, 2, 8 ) Relu 520
Flatten (None, 32) - 0
Dense (None, 8 ) Relu 264
Dense (None, 1 ) Sigmoid 9

TensorFlow Model Visualization

The model can be visualized as follows:

image info

Results

The accuracy for the Train Set, Validation Set and Test Set is as follows:

Metric Train Set Validation Set Test Set
Accuracy 99.25 % 98.00 % 98.00 %

The model training procedure as well as the pre-trained model can be found in the tensorflow_model directory.


Visualization Dashboard

To analyse and visualize the results stored in the in the Elasticsearch database, a Dash Application was created. There are two main components to the Visualization Dashboard :

Historical Analysis

A demo of the application can be seen below:

Historical Analysis has three main use cases:

Manual Vetting

A demo of the application can be seen below:

Manual Vetting allows the user to manually vet domain names that the model has a low confidence on, thereby creating a new dataset of malicious or benign domains. This dataset can be used for blocking or allowing domains and also for updating the dataset for retraining the model.

To run the Dash application cd into the analysis_app directory and then enterpython malicious_domain_name_analysis.py in the command line. Please note that the Elasticsearch server has to run in the background.


Testing Harness

As there are various components to the machine learning pipeline, each component has its own testing harness.

CoreDNS Test Harness

To test whether CoreDNS works properly, make the following change to the Corefile:

.:1053 {
    whoami
}

Then 'cd' into the coredns directory and enter ./coredns in the command line.

Open a new terminal and then enter dig @127.0.0.1 -p 1053 www.example.com. If a reply is received CoreDNS is working properly.

Elasticsearch Test Harness

To test whether Elasticsearch works properly, first run Elasticsearch by going into the Elasticsearch directory and then enter bin/elasticsearch in the command line. Next enter the following into a new terminal:

python
>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>> es.indices.get('*')

If Python doesn't throw any errors and returns a JSON object, Elasticsearch is working well.

Machine Learning Plugin and Flask Server Test Harness

To test whether the machine learning plugin as well as the Flask server works properly first run CoreDNS with the machine learning plugin at a particular port, run Elasticsearch and run the Flask Server.

Next open a new terminal and enter dig @127.0.0.1 - p port_number www.google.com. Then open the terminal where the CoreDNS server is running and check the output. If the output contains either Benign Domain: [domain_name] | Probability: [probability_value] or Malicous Domain: [domain_name] | Probability: [probability_value] then both the machine learning plugin and the Flask server are working well.

Dash Application Test Harness

The Dash application has an in built debugger that throws errors if anything goes wrong. If there are no errors thrown by the debugger, the application is working well.