GoogleCloudDataproc / bdutil

[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
https://cloud.google.com/dataproc
Apache License 2.0
109 stars 94 forks source link

Hadoop does not seem to have been installed #52

Closed stev-0 closed 9 years ago

stev-0 commented 9 years ago

I am using Cygwin and trying to create an HDP using the command ./bdutil -e platforms/hdp/ambari_env.sh deploy . All my instances and disks are built fine, and I am able to SSH into the machine, but there is absolutely no Hadoop stack there at all. I can see by running yum search that there is no haddop repository enabled, so assume that is the problem.

There is no error message indicating anything had failed. I get the following output:

  CONFIGBUCKET='xxx'
  PROJECT='xxxx'
  GCE_IMAGE='centos-6'
  GCE_ZONE='europe-west1-b'
  GCE_NETWORK='default'
  PREEMPTIBLE_FRACTION=0.0
  PREFIX='hadoop'
  NUM_WORKERS=3
  MASTER_HOSTNAME='hadoop-m'
  WORKERS='hadoop-w-0 hadoop-w-1 hadoop-w-2'
  BDUTIL_GCS_STAGING_DIR='gs://osm_hadoop/bdutil-staging/hadoop-m'
        MASTER_ATTACHED_PD='hadoop-m-pd'
  WORKER_ATTACHED_PDS='hadoop-w-0-pd hadoop-w-1-pd hadoop-w-2-pd'
  (y/n) y
Sun, Aug 09, 2015  1:52:35 PM: Checking for existence of gs://osm_hadoop...
gs://osm_hadoop/
Sun, Aug 09, 2015  1:52:45 PM: Checking for existence of gs://hadoop-dist/hadoop-2.6.0.tar.gz...
Sun, Aug 09, 2015  1:52:55 PM: Checking upload files...
Sun, Aug 09, 2015  1:52:55 PM: Verified './conf/hadoop2/bigtable-hbase-site-template.xml'
Sun, Aug 09, 2015  1:52:55 PM: Verified './conf/hadoop2/bq-mapred-template.xml'
Sun, Aug 09, 2015  1:52:55 PM: Verified './conf/hadoop2/core-template.xml'
Sun, Aug 09, 2015  1:52:55 PM: Verified './conf/hadoop2/gcs-core-template.xml'
Sun, Aug 09, 2015  1:52:55 PM: Verified './conf/hadoop2/hdfs-template.xml'
Sun, Aug 09, 2015  1:52:55 PM: Verified './conf/hadoop2/mapred-template.xml'
Sun, Aug 09, 2015  1:52:55 PM: Verified './conf/hadoop2/yarn-template.xml'
Sun, Aug 09, 2015  1:52:55 PM: Verified './libexec/hadoop_helpers.sh'
Sun, Aug 09, 2015  1:52:55 PM: Verified './libexec/configure_mrv2_mem.py'
Sun, Aug 09, 2015  1:52:55 PM: Verified './hadoop2_env.sh'
Sun, Aug 09, 2015  1:52:55 PM: Verified './platforms/hdp/ambari.conf'
Sun, Aug 09, 2015  1:52:56 PM: Verified './platforms/hdp/ambari_functions.sh'
Sun, Aug 09, 2015  1:52:56 PM: Verified './libexec/hadoop_helpers.sh'
Sun, Aug 09, 2015  1:52:56 PM: Verified './platforms/hdp/configuration.json'
Sun, Aug 09, 2015  1:52:56 PM: Verified './platforms/hdp/resources/public-hostname-gcloud.sh'
Sun, Aug 09, 2015  1:52:56 PM: Verified './platforms/hdp/resources/thp-disable.sh'
Sun, Aug 09, 2015  1:52:56 PM: Verified './platforms/hdp/ambari_manual_env.sh'
Sun, Aug 09, 2015  1:52:56 PM: Verified './platforms/hdp/create_blueprint.py'
Sun, Aug 09, 2015  1:52:56 PM: Generating 12 command groups...
Sun, Aug 09, 2015  1:52:59 PM: Done generating remote shell scripts.
Sun, Aug 09, 2015  1:52:59 PM: Creating attached worker disks: hadoop-w-0-pd hadoop-w-1-pd   hadoop-w-2-pd
...Sun, Aug 09, 2015  1:53:00 PM: Creating attached master disk: hadoop-m-pd
.Sun, Aug 09, 2015  1:53:00 PM: Done creating disks!
 Sun, Aug 09, 2015  1:53:00 PM: Waiting on async 'disks create' jobs to finish. Might take a while...
 ....
Sun, Aug 09, 2015  1:53:15 PM: Creating worker instances: hadoop-w-0 hadoop-w-1 hadoop-w-2
...Sun, Aug 09, 2015  1:53:16 PM: Creating master instance: hadoop-m
.Sun, Aug 09, 2015  1:53:16 PM: Waiting on async 'instances create' jobs to finish. Might take a while... 
....
Sun, Aug 09, 2015  1:54:13 PM: Instances all created. Entering polling loop to wait for ssh-ability

I will try on Linux and test, assume it may be something to do with Cygwin.

stev-0 commented 9 years ago

I have tried on Linux as well and I get similar problems. I also have problems getting into the machine via ssh, but when I do manage (usually via the cloud SSH console), again hadoop is not installed.

Some of the SSH errors I am receiving are:

WARNING: You do not have an SSH key for Google Compute Engine.
WARNING: [/usr/bin/ssh-keygen] will be executed to generate a key.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
ERROR: (gcloud.compute.ssh) Could not SSH to the instance.  It is possible that your SSH key has not propagated to the instance yet. Try running this command again.  If you still cannot connect, verify that the firewall and instance are set to accept ssh traffic.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

I realise this isn't a help forum but I am struggling here and would be happy to contribute back to the docs if someone could give me a pointer in the right direction. Looking at the source code, it does seem that the create_cluster function isn't returning due to the ssh problems so looks like a need a hand with my SSH keys. I have tried using existing keys, letting the software generate new keys with and without passphrases and adding existing SSH keys (or ones generated by the script) to the project metadata before deploying the cluster. None of this seems to work and I have only been able to get in through the console.

stev-0 commented 9 years ago

It was as simple as one command (ssh-add) on my Fedora 22 box - via this: https://stackoverflow.com/questions/28201353/cannot-ssh-into-instance-using-gcloud/28202175#28202175. I will be happy to add some information to the readme if you think that's appropriate - it took me long enough to solve!