Closed ac74475 closed 5 years ago
Deploying Palisade on AWS EMR (in the Ireland region) is expected to be done via a Terraform template that spins up a 4 node EMR cluster with Hadoop, Zookeeper, JupyterHub (to provide an interactive interface to run the queries), Ganglia (To monitor resource usage), Spark (to run the spark client when that issue is complete). It is expected that by default there would be a data service running on each of the core nodes and one of each of the rest of the services would be running on the master node, but with a easily configurable setting to increase the number of each service running. The process flow of running the AWS deployment example is as follows:
Write a script to deploy the Palisade services to access data stored in HDFS on an AWS EMR cluster