This repository contains code for connecting the CoreDNS server to a Machine Learning Environment for DNS request and response analysis, monitoring and alerting.
This repository is the result of my work on a project during GSoC 2020. The goal of the project was introducing the integration of machine learning capabilities with the CoreDNS server.
The initial use case was the identification of the DNS requests ,via machine learning, for the records of the domains that could be used by malicious hackers and other computer criminals. Upon the identification of the requests the plugin would either alert the sysadmin for manual vetting or block the requests and responses.
Currently, Golang doesn't have native libraries for the interaction with the CUDA platform. At the same time, the Python ecosystem has tools like TensorFlow, PyTorch, MXNet and various others that not only interact with the CUDA platform but also allows for the easy prototyping and evaluation of deep learning models.
This project combines the deep learning capabilities that the Python ecosystem provides, with CoreDNS, by creating:
Learning Dataset
The deep-learning model is trained on a COVID-19 Cyber Threat Coalition Blacklist for malicious domains that can be found here and on a list of benign domains from DomCop that can be found here.
Currently, the pre-trained model has been trained on the top 500 domain names from both these datasets. The final version of the pre-trained model will be trained on the entirety of both the datasets.
Learning Process
Data Preprocessing: Each domain name is converted into a unicode code point representation and then extended to a numpy array of a length 256. The dataset was created by combining the malicious domains as well as the non-malicious. The dataset was split as follows:
Training: The deep-learning model is a Convolutional Neural Net that is trained using batch gradient descent with the Adam optimizer.
The machine learning plugin forwards a request to the Flask Server for domain name analysis. The Flask Server then processes the request and sends back whether the domain name is malicious or benign. If the domain name is malicious, the plugin prevents the fallthrough to other plugins and sends back a Honeypot or Blackhole IP.
The Flask Server first preprocessess the request forwarded from the Machine Learning Plugin. The preprocesssed request is then sent to the machine learning model where it infers whether it is benign or malicious.
If the model is highly confident that the domain name is benign, a response is sent back to the Machine Learning Plugin that allows the fallthrough to other plugins.
If the model is highly confident that the domain name is malicious, a response is sent back to the Machine Learning Plugin that prevents the fallthrough to other plugins. Moreover, the Machine Learning Plugin sends back a Honeypot or a Blackhole IP to the user querying the malicious domain.
If the model is not very confident about the its inference, then the the manually vetted lists are looked at. If the domain name exists in the benign domain list, then the same procedure is followed as described above for benign domains. Similarly, if a malicious domain exists in the malicious domain list, then the same procedure is followed as described above for malicious domains. If the domain name is not present in any of the lists, then the same procedure is followed as described above for benign domains, however, these domains are stored in the database for manual vetting.
In all the three above scenarios, the results as well as other metadata are stored in the database.
The Dash Application has two main use cases:
The machine learning plugin is a CoreDNS plugin that forwards requests to the Flask Server via HTTP POST requests. Once the Flask server processes the request, it sends the prediction, whether the domain name is malicious or benign, back to the plugin. Depending on the nature of the domain name, the plugin can be configured to allow the request to fall through to the other plugins or send the request to a honeypot or a blackhole.
To install and start CoreDNS please take a look at the CoreDNS repository. To add external plugins, please take a look at the example plugin.
To add the plugin to a particular port say 1053, please make the changes to the Corefile as shown below:
.:1053 {
mlplugin
}
The middleware is a Python Flask Server that contains the pre-trained Convolutional Neural Network. The Flask Server receives the domain name queried as well as the IP address of the machine used to query that particular domain name, as a JSON message, via HTTP POST requests from the plugin.
Once the Flask Server receives the domain name and the IP address, the domain name is preprocessed and then passed to the pre-trained deep learning model. The deep learning model then classifies whether the domain name is of a malicious website or not and then sends the same back to plugin as a JSON message.
The classification result as well as other metadata such as the IP address, the date and time of the request are stored in a NoSQL database, namely Elasticsearch, due to which storing and querying the classification result and the metadata is a fast process.
Before running the Flask Server, it is recommended that the Elasticsearch server
is running in the background. To install Elasticsearch, please follow the
instructions found on this
page. Once
Elasticsearch is installed, cd
into it and enter bin/elasticsearch
to run the
Elasticsearch server.
To run the Flask Server, cd
into the flask_server
directory and then enter
python dns_monitoring_server.py
in the command line.
The pre-trained deep learning model is a Convolutional Neural Net whose input is a (16, 16, 1) shaped array and the output is a single value lying in between 0 and 1. If the output value is less than 0.5 the domain name is considered benign , else it is considered malicious.
The model summary can be found below:
Layer | Output Shape | Activation | Number of Parameters |
---|---|---|---|
Input | (None, 16, 16, 1 ) | - | 0 |
Conv2D | (None, 15, 15, 16) | Relu | 80 |
MaxPooling | (None, 7, 7, 16) | - | 0 |
Conv2D | (None, 6, 6, 16) | Relu | 1040 |
MaxPooling | (None, 3, 3, 16) | - | 0 |
Conv2D | (None, 2, 2, 8 ) | Relu | 520 |
Flatten | (None, 32) | - | 0 |
Dense | (None, 8 ) | Relu | 264 |
Dense | (None, 1 ) | Sigmoid | 9 |
The model can be visualized as follows:
The accuracy for the Train Set, Validation Set and Test Set is as follows:
Metric | Train Set | Validation Set | Test Set |
---|---|---|---|
Accuracy | 99.25 % | 98.00 % | 98.00 % |
The model training procedure as well as the pre-trained model can be found in
the tensorflow_model
directory.
To analyse and visualize the results stored in the in the Elasticsearch database, a Dash Application was created. There are two main components to the Visualization Dashboard :
A demo of the application can be seen below:
Historical Analysis has three main use cases:
A demo of the application can be seen below:
Manual Vetting allows the user to manually vet domain names that the model has a low confidence on, thereby creating a new dataset of malicious or benign domains. This dataset can be used for blocking or allowing domains and also for updating the dataset for retraining the model.
To run the Dash application cd
into the analysis_app
directory and then
enterpython malicious_domain_name_analysis.py
in the command line. Please note
that the Elasticsearch server has to run in the background.
As there are various components to the machine learning pipeline, each component has its own testing harness.
To test whether CoreDNS works properly, make the following change to the Corefile:
.:1053 {
whoami
}
Then 'cd' into the coredns
directory and enter ./coredns
in the command line.
Open a new terminal and then enter dig @127.0.0.1 -p 1053 www.example.com
. If a
reply is received CoreDNS is working properly.
To test whether Elasticsearch works properly, first run Elasticsearch by going
into the Elasticsearch directory and then enter bin/elasticsearch
in the
command line. Next enter the following into a new terminal:
python
>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>> es.indices.get('*')
If Python doesn't throw any errors and returns a JSON object, Elasticsearch is working well.
To test whether the machine learning plugin as well as the Flask server works properly first run CoreDNS with the machine learning plugin at a particular port, run Elasticsearch and run the Flask Server.
Next open a new terminal and enter dig @127.0.0.1 - p port_number www.google.com
. Then open the terminal where the CoreDNS server is running and
check the output. If the output contains either Benign Domain: [domain_name] | Probability: [probability_value]
or Malicous Domain: [domain_name] | Probability: [probability_value]
then both the machine learning plugin and the
Flask server are working well.
The Dash application has an in built debugger that throws errors if anything goes wrong. If there are no errors thrown by the debugger, the application is working well.