gchq / Palisade

A Tool for Complex and Scalable Data Access Policy Enforcement
Apache License 2.0
96 stars 18 forks source link

Script how to deploy Palisade on AWS EMR cluster #114

Closed ac74475 closed 5 years ago

ac74475 commented 6 years ago

Write a script to deploy the Palisade services to access data stored in HDFS on an AWS EMR cluster

ac74475 commented 5 years ago

Deploying Palisade on AWS EMR (in the Ireland region) is expected to be done via a Terraform template that spins up a 4 node EMR cluster with Hadoop, Zookeeper, JupyterHub (to provide an interactive interface to run the queries), Ganglia (To monitor resource usage), Spark (to run the spark client when that issue is complete). It is expected that by default there would be a data service running on each of the core nodes and one of each of the rest of the services would be running on the master node, but with a easily configurable setting to increase the number of each service running. The process flow of running the AWS deployment example is as follows: