cluster command for hadoop and slurm clusters with OpenStack Heat

lee212 commented 9 years ago

To support a compute cluster on cloudmesh, we are offering a new design of the orchestration for clusters with big data. We will investigate OpenStack Heat, Chef, Puppet and Docker to see if their are some benefits of their design decisions and suggest the new approach for starting and configuring clusters such as Hadoop or SLURM on cloudmesh (i.e. OpenStack). With our new design and implementation of cluster manager, we will have a new command cluster to start, configure, manage or update compute nodes (vms) on cloudmesh. We identify the current issues so far:

Start virtual machines with a fixed number of nodes
- scalable clusters can be configured manually only
Communication across multiple nodes via ssh by hand
- updating authorized_keys and hosts files are done manually via script
Configuration master and worker nodes by hand
- It is being done by chef cookbooks but there is no automated script.
Confirmation of proper working and configuration
- A user need to verify each node working correctly as expected.

We offer new features in the new design and the implementation:

Easy start up vms with OpenStack Heat
- pre-configured templates provide a simple way to launch vms with its initialization processes.
ssh-copy-id with OpenStack Heat
- ssh authentication can be easily estabilished with OpenStack Heat SoftwareDevelopment (from Icehouse release)
- There is an example of setting ssh keys across multiple nodes: https://github.com/openstack/heat-templates/blob/master/hot/software-config/example-templates/example-ssh-copy-id.yaml
separated templates for master and worker nodes
- in cloudmesh, we provide two individual templates for master and worker nodes so the initialization steps can be taken differently.
Verification of completed setup of a cluster
- With defining expected results in cloudmesh, cluster command understands that the cluster is up and running properly. Otherwise, all relevant logs and messages will be reported to the user.

Our tentative plans for the command cluster:

Example 1. start 5 vms for hadoop cluster cluster start hadoop --num=5 hadoop is a template name in this command which contains the location of openstack heat template and templates for master and worker nodes.

Example 2. write a template for the master node of hadoop

cluster write hadoop master
#!/bin/bash
            curl -L https://www.opscode.com/chef/install.sh | bash
            wget http://github.com/opscode/chef-repo/tarball/master
            tar -zxf master
            mv opscode-chef-repo* /home/ubuntu/chef-repo
            rm master
            mkdir /home/ubuntu/chef-repo/.chef
            echo "cookbook_path [ '/home/ubuntu/chef-repo/cookbooks' ]" > /home/ubuntu/chef-repo/.chef/knife.rb
            knife cookbook site download java
            knife cookbook site download apt
            knife cookbook site download yum
            knife cookbook site download hadoop
...
(ctrl + d or EOF to exit writing)

Example 3. define expected result after proper installation and configuration. cluster success hadoop "service hadoop-hdfs-namenode status" "* Hadoop namenode is running" "service hadoop-hdfs-namenode status" is a command to verify. "* Hadoop namenode is running" is an expected result.

lee212 commented 9 years ago

Mark work on this together.

laszewsk commented 9 years ago

document the work that Hyungro has done to create a hadoop cluster

cloudmesh / cloudmesh1

cluster command for hadoop and slurm clusters with OpenStack Heat #159