This repository has three parts:
We also endeavor to provide an extensive wiki documentation
AmbariKave extends Ambari adding some more services. It does this by adding a stack to Ambari. Ambari is nicely extensible and adding a stack does not interfere with older stacks, not can it interfere with already running services.
This means there are two general ways to install these services
If you are looking for the extensive documentation, including descriptions of disk/cpu/ram requirements, please look at the installation wiki
Ambari is a cluster installation management system for hadoop-based clusters. It installs separate services on different machines across a cluster. AmbariKave is a small extention fo this. If what you're looking for is a common set of data science tools to install on one single machine (without a database or hdfs) consider KaveToolbox
To download and install a released version of AmbariKave from the repos server: http://repos.kave.io , e.g. 3.5-Beta, with username repos and password kaverepos, including downloading and installing ambari:
yum -y install wget curl tar zip unzip gzip python
wget http://repos:kaverepos@repos.kave.io/noarch/AmbariKave/3.5-Beta/ambarikave-installer-3.5-Beta.sh
sudo bash ambarikave-installer-3.5-Beta.sh
( NB: the repository server uses a semi-private password only as a means of avoiding robots and reducing DOS attacks this password is intended to be widely known and is used here as an extension of the URL )
# If on Centos6, turn off iptables with:
sudo service iptables stop
sudo chkconfig iptables off
# If on Centos7 use:
systemctl disable firewalld
systemctl stop firewalld
#test ssh keys with
ssh -T git@github.com
#if this works,
git clone git@github.com:KaveIO/AmbariKave.git
# Once you have a local checkout, install it with:
cd AmbariKave
sudo dev/install.sh
sudo dev/patch.sh
sudo ambari-server start
Then to provision your cluster go to: http://YOUR_AMBARI_NODE:8080 or deploy using a blueprint, see https://cwiki.apache.org/confluence/display/AMBARI/Blueprints
Released version of AmbariKave from the repos server: http://repos.kave.io , e.g. 3.5-Beta, with username repos and password kaverepos, over existing ambari:
yum -y install wget curl tar zip unzip gzip python
wget http://repos:kaverepos@repos.kave.io/noarch/AmbariKave/3.5-Beta/ambarikave-package-3.5-Beta.tar.gz
tar -xzf ambarikave-package-3.5-Beta.tar.gz -C /var/lib/
OR to install the HEAD from git: example given with ssh copying from this github repo.
#test ssh keys with
ssh -T git@github.com
#if this works,
git clone git@github.com:KaveIO/AmbariKave.git
# Once you have a local checkout, install it with:
sudo ambari-server stop
cd AmbariKave
sudo dev/patch.sh
sudo ambari-server start
If you have the head checked out from git, you can update with:
Connect to your ambari/admin node
sudo where/I/checked/out/ambari/dev/pull-update.sh
pull-update also respects git branches, as a command-line argument and is linked into the way we do automated deployment and testing
To update between released versions, simply install the new version over the old version after stopping the ambari server. Installing a new version of the stack, will not trigger an update of any running service. You would need to do this manually in the current state.
sudo ambari-server stop
wget http://repos:kaverepos@repos.kave.io/noarch/AmbariKave/3.5-Beta/ambarikave-installer-3.5-Beta.sh
sudo bash ambarikave-installer-3.5-Beta.sh
( NB: the repository server uses a semi-private password only as a means of avoiding robots and reducing DOS attacks this password is intended to be widely known and is used here as an extension of the URL )
If you are looking for the extensive documentation, including descriptions of disk/cpu/ram requirements, please look at the installation wiki
If you have taken the released version, go to http://YOUR_AMBARI_NODE:8080 or deploy using a blueprint, see https://cwiki.apache.org/confluence/display/AMBARI/Blueprints If you have git access, and are working from the git version, See the wiki.
We really recommend installation beginning from a blueprint, but first one must carfully design the blueprint and/or test on some other test resource. The web interface is great for single one-time custom installations, a blueprint is good for pre-tested redeployable installations.
FreeIPA can provide all necessary keytabs for your kerberized cluster, using the kerberos.csv given by the Ambari wizard. Be careful because you need to pause while using the wizard when given the option to download the csv, and do some things on the command line before continuing.
You can follow the tutorial here: https://youtu.be/hL1yiMlgg0E
And/or follow these steps:
The createkeytabs.py script creates all necessary service and user principals, any missing local users or groups, creates temporary keytabs on the ambari node, copies them to the required places on the nodes, removes the local intermediate files, and tests that the new ketyabs work for those services.
See the deployment subdirectory, or the deployment tarball kept separately
yum -y install wget curl tar zip unzip gzip python
wget http://repos:kaverepos@repos.kave.io/noarch/AmbariKave/3.5-Beta/ambarikave-deployment-3.5-Beta.tar.gz
tar -xzf ambarikave-deployment-3.5-Beta.tar.gz
Or download the head from github. See the github readme on the deployment tools, the help written for each tool, or better yet, contact us if you'd like some advice on how to use anything here. Deployment readme
Ideally all of your nodes will have access to the internet during installation in order to download software.
If this is not the case, you can, possibly, implement a near-side cache/mirror of all required software. This is not very easy, but once it is done one time, you can keep it for later.
To setup a local near-side cache for the KAVE tool stack is quite easy. First either copy the entire repository website to your own internal apache server, or copy the contents of the directories to your own shared directory visible from every node.
mkdir -p /my/shared/dir
cd /my/shared/dir
wget -R http://repos.kave.io/
Then create a /etc/kave/mirror file on each node with the new top-level directory to try first before looking for our website:
echo "/my/shared/dir" >> /etc/kave/mirror
echo "http://my/local/apache/mirror" >> /etc/kave/mirror
So long as the directory structure of the nearside cache is identical to our website, you can drop, remove or replace, any local packages you will never install from this directory structure, and update it as our repo server updates.
OpenVPN can be installed and setup on the desired node(s) by running the below command:
wget https://git.io/vpn -O openvpn-install.sh && bash openvpn-install.sh
This is an interactive OpenVPN installation and administration tool.
Read more here about kave versioning: https://github.com/KaveIO/AmbariKave/wiki/kave-versioning
KAVE extends a HDP stack, adding additional services. See the versioning diagram on our wiki for details.
The HDP stack number looks like X.Y, with a major and minor version. The KAVE also has an W.Z versioning scheme, but this is not 100% coupled to the HDP stack.
A KAVE official version tag appears like:
The tag is split into four parts:
A new major version is started whenever changes of the following type are made:
We currently name our stack within ambari to reflect both the version of the HDP stack we depend on, and the installed version of the KAVE.
This is the stack name you will see in blueprints and in the ambari web interface. In older KAVE versions we used a different approach, not including the KAVE stack tag.