futuresystems / big-data-stack

Hadoop-based Big Data stack (hdfs, yarn, spark, etc)
Apache License 2.0
6 stars 17 forks source link

Adding Hive role and MySQL role #67

Closed kjaugustin closed 8 years ago

kjaugustin commented 8 years ago

Adding Hive role and MySQL role. Sending PR to the Unstable branch.

badmutex commented 8 years ago

Please revert the changes to .cluster.py and ansible.cfg.

badmutex commented 8 years ago

Please add a description to the PR indicating a summary of the changes and how and where you tested. How did you ensure that hive was installed correctly and is usable?

laszewsk commented 8 years ago

Although it is possible in hive to specify alternative database backends for the purpose of this project you do not have to consider it. You can just use one of your choice. MySQL is a fine choice.

I am not sure where the HIVE_METASTORE is set though. As badi said you could probably just leave it of as you do not do alternative instalations for postgress or others once you decided for mysql

kjaugustin commented 8 years ago

At this point the HIVE_METASTORE variable is set using group_vars. I wrote this code thinking that I need to integrate this with the futuresystems BDS. So I made provisions to add different flavors of databases in the future if needed or if I could collaborate to expand this . Though this is working when tested on Ubuntu cluster provisioned on Chameleon Cloud, I know this could be improved for better performance, scalability and high availability. If time permits, I could work on these issues later. Thanks for reviewing the code!

kjaugustin commented 8 years ago

Hive and MySQL roles were tested using Ubuntu14.04 VMs build on Chameleon cloud infrastructure. Please find the following steps to test the code. I will send a carefully formatted instructions.rst file later on.

Setup Instructions:

  1. Setup Chameleon cloud computing infrastructure and load Openstack module.

badi@i136 ~ $ source ~/CH-817724-openrc.sh badi@i136 ~ $ module load openstack

  1. Setup the bds virtualenv

virtualenv bds source bds/bin/activate

  1. Setup ssh agent

eval $(ssh-agent -s) ssh-add ~/.ssh/id_rsa

  1. get a local copy of the futuresystems bigdata stack may be using git clone command.
  2. Edit .cluster.py file and ansible.cfg file to make it suitable to run on Chameleon.
  3. Run commands

pip install -r requirements.txt

vcl boot -p openstack -P $USER-

  1. Sanity check using the following command

ansible all -m ping

  1. Perform multi node hadoop cluster installation on Chameleon cloud infrastructure

ansible-playbook play-hadoop.yml

  1. Now install hive as an addon.

ansible-playbook addons/hive.yml

  1. Sanity check

    a. log into the frontendnode using ip address or the hostname

    ssh cc@$USER-master0

    b. log in as hadoop user

    sudo su - hadoop

    c. Enter hive to start the hive CLI

    hadoop@$USER-master0: hive

    d. Enter show tables on hive command line. Don't forget to enter the;at the end of SQL command

     hive> show tables;
     OK
     Time taken: 5.01 seconds
     hive>
laszewsk commented 8 years ago

when doing the improvement do for example (see comments for first steps …

as you can see we use === :: and indentation, so this should be real simple and will help reviewers visually.

On May 4, 2016, at 12:26 AM, kjaugustin notifications@github.com wrote:

Hive and MySQL roles were tested using Ubuntu14.04 VMs build on Chameleon cloud infrastructure. Please find the following steps to test the code. I will send a carefully formatted instructions.rst file later on.

Setup Instructions:

Setup Chameleon cloud computing infrastructure and load Openstack module.:: badi@i136 ~ $ source ~/CH-817724-openrc.sh badi@i136 ~ $ module load openstack

Setup the bds virtualenv:: virtualenv bds source bds/bin/activate

Setup ssh agent eval $(ssh-agent -s) ssh-add ~/.ssh/id_rsa

get a local copy of the futuresystems bigdata stack may be using git clone command.

Edit .cluster.py file and ansible.cfg file to make it suitable to run on Chameleon.

Run commands

pip install -r requirements.txt

vcl boot -p openstack -P $USER-

Sanity check using the following command ansible all -m ping

Perform multi node hadoop cluster installation on Chameleon cloud infrastructure ansible-playbook play-hadoop.yml

Now install hive as an addon. ansible-playbook addons/hive.yml

Sanity check

a. log into the frontendnode using ip address or the hostname

ssh cc@$USER-master0

b. log in as hadoop user

sudo su - hadoop

c. Enter hive to start the hive CLI

hadoop@$USER-master0: hive

d. Enter show tables on hive command line. Don't forget to enter the;at the end of SQL command

 hive> show tables;
 OK
 Time taken: 5.01 seconds
 hive>

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/futuresystems/big-data-stack/pull/67#issuecomment-216738317

badmutex commented 8 years ago

@kjaugustin

Once you

I can merge this PR.

kjaugustin commented 8 years ago

Badi,

For some reason I get fatal error when trying to revert. fatal: bad revision 'ansible.cfg' So is it possible to do a git cherry-pick on your end. Or else I can checkout these files again and send the PR request. Please advise..

kjaugustin commented 8 years ago

ansible.cfg and .cluster.py are now revert back upstream.