GoogleCloudDataproc / bdutil

[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
https://cloud.google.com/dataproc
Apache License 2.0
109 stars 94 forks source link

'hadoop-m' not yet sshable (1); sleeping 10 #63

Open lakshminarayanar opened 9 years ago

lakshminarayanar commented 9 years ago

I'm using google trail account. when i execute the command ./bdutil -e platforms/hdp/ambari_env.sh deploy it throwed the following error on master and worker nodes.
'hadoop-m' not yet sshable (1); sleeping 10
The complete output of the above command is
[root@sandbox bdutil]# ./bdutil -e platforms/hdp/ambari_env.sh deploy Thu Sep 17 09:20:40 UTC 2015: Using local tmp dir for staging files: /tmp/bdutil-20150917-092040-XpJ Thu Sep 17 09:20:40 UTC 2015: Using custom environment-variable file(s): bdutil_env.sh platforms/hdp/ambari_env.sh Thu Sep 17 09:20:40 UTC 2015: Reading environment-variable file: ./bdutil_env.sh Thu Sep 17 09:20:40 UTC 2015: Reading environment-variable file: platforms/hdp/ambari_env.sh Importing dependent env file: ./platforms/hdp/ambari_manual_env.sh Importing dependent env file: ./hadoop2_env.sh Importing dependent env file: ./platforms/hdp/ambari.conf ./platforms/hdp/ambari.conf: line 1: s: command not found Importing dependent env file: ./platforms/hdp/ambari_functions.sh Thu Sep 17 09:20:40 UTC 2015: No explicit GCE_MASTER_MACHINE_TYPE provided; defaulting to value of GCE_MACHINE_TYPE: n1-standard-2 Deploy cluster with following settings? CONFIGBUCKET='hadoopbucket001' PROJECT='hadoop-001-1071' GCE_IMAGE='centos-6' GCE_ZONE='us-central1-a' GCE_NETWORK='default' PREEMPTIBLE_FRACTION=0.0 PREFIX='hadoop' NUM_WORKERS=2 MASTER_HOSTNAME='hadoop-m' WORKERS='hadoop-w-0 hadoop-w-1' BDUTIL_GCS_STAGING_DIR='gs://hadoopbucket001/bdutil-staging/hadoop-m' MASTER_ATTACHED_PD='hadoop-m-pd' WORKER_ATTACHED_PDS='hadoop-w-0-pd hadoop-w-1-pd' (y/n) y Are you sure you want to run the command as root? (y/n)y Thu Sep 17 09:20:49 UTC 2015: Checking for existence of gs://hadoopbucket001... gs://hadoopbucket001/ Thu Sep 17 09:20:56 UTC 2015: Checking for existence of gs://hadoop-dist/hadoop-2.7.1.tar.gz... Thu Sep 17 09:20:59 UTC 2015: Checking upload files... Thu Sep 17 09:20:59 UTC 2015: Verified './conf/hadoop2/bigtable-hbase-site-template.xml' Thu Sep 17 09:20:59 UTC 2015: Verified './conf/hadoop2/gcs-core-template.xml' Thu Sep 17 09:20:59 UTC 2015: Verified './conf/hadoop2/core-template.xml' Thu Sep 17 09:20:59 UTC 2015: Verified './conf/hadoop2/yarn-template.xml' Thu Sep 17 09:20:59 UTC 2015: Verified './conf/hadoop2/hdfs-template.xml' Thu Sep 17 09:20:59 UTC 2015: Verified './conf/hadoop2/bq-mapred-template.xml' Thu Sep 17 09:20:59 UTC 2015: Verified './conf/hadoop2/mapred-template.xml' Thu Sep 17 09:20:59 UTC 2015: Verified './libexec/hadoop_helpers.sh' Thu Sep 17 09:20:59 UTC 2015: Verified './libexec/configure_mrv2_mem.py' Thu Sep 17 09:20:59 UTC 2015: Verified './hadoop2_env.sh' Thu Sep 17 09:20:59 UTC 2015: Verified './platforms/hdp/ambari.conf' Thu Sep 17 09:20:59 UTC 2015: Verified './platforms/hdp/ambari_functions.sh' Thu Sep 17 09:20:59 UTC 2015: Verified './libexec/hadoop_helpers.sh' Thu Sep 17 09:20:59 UTC 2015: Verified './platforms/hdp/configuration.json' Thu Sep 17 09:20:59 UTC 2015: Verified './platforms/hdp/resources/public-hostname-gcloud.sh' Thu Sep 17 09:20:59 UTC 2015: Verified './platforms/hdp/resources/thp-disable.sh' Thu Sep 17 09:20:59 UTC 2015: Verified './platforms/hdp/ambari_manual_env.sh' Thu Sep 17 09:20:59 UTC 2015: Verified './platforms/hdp/create_blueprint.py' Thu Sep 17 09:20:59 UTC 2015: Generating 12 command groups... Thu Sep 17 09:21:00 UTC 2015: Done generating remote shell scripts. Thu Sep 17 09:21:00 UTC 2015: Creating attached worker disks: hadoop-w-0-pd hadoop-w-1-pd ..Thu Sep 17 09:21:00 UTC 2015: Creating attached master disk: hadoop-m-pd .Thu Sep 17 09:21:00 UTC 2015: Done creating disks! Thu Sep 17 09:21:01 UTC 2015: Waiting on async 'disks create' jobs to finish. Might take a while... ... Thu Sep 17 09:21:11 UTC 2015: Creating worker instances: hadoop-w-0 hadoop-w-1 ..Thu Sep 17 09:21:11 UTC 2015: Creating master instance: hadoop-m .Thu Sep 17 09:21:11 UTC 2015: Waiting on async 'instances create' jobs to finish. Might take a while... ... Thu Sep 17 09:22:02 UTC 2015: Instances all created. Entering polling loop to wait for ssh-ability ...Thu Sep 17 09:22:03 UTC 2015: Waiting on async 'wait_for_ssh' jobs to finish. Might take a while... Thu Sep 17 09:22:09 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:22:09 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:22:09 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:22:26 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:22:26 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:22:26 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:22:43 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:22:43 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:22:43 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:22:59 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:00 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:01 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:23:15 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:18 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:18 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:23:31 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:36 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:38 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:23:49 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:51 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:23:56 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:24:07 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:24:08 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:24:11 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:24:24 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:24:24 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:24:28 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. ...Thu Sep 17 09:24:42 UTC 2015: 'hadoop-w-0' not yet sshable (1); sleeping 10. Thu Sep 17 09:24:42 UTC 2015: 'hadoop-m' not yet sshable (1); sleeping 10. Thu Sep 17 09:24:46 UTC 2015: 'hadoop-w-1' not yet sshable (1); sleeping 10. Thu Sep 17 09:24:52 UTC 2015: Node 'hadoop-w-0' did not become ssh-able after 10 attempts Thu Sep 17 09:24:52 UTC 2015: Node 'hadoop-m' did not become ssh-able after 10 attempts Thu Sep 17 09:24:56 UTC 2015: Node 'hadoop-w-1' did not become ssh-able after 10 attempts Thu Sep 17 09:24:56 UTC 2015: Command failed: wait ${SUBPROC} on line 326. Thu Sep 17 09:24:56 UTC 2015: Exit code of failed command: 1 Thu Sep 17 09:24:56 UTC 2015: Detailed debug info available in file: /tmp/bdutil-20150917-092040-XpJ/debuginfo.txt Thu Sep 17 09:24:56 UTC 2015: Check console output for error messages and/or retry your command.

And the error log file /tmp/bdutil-20150917-092040-XpJ/debuginfo.txt output is

******************* gcloud compute stdout *******************
NAME        ZONE          SIZE_GB TYPE        STATUS
hadoop-m-pd us-central1-a 1500    pd-standard READY
NAME          ZONE          SIZE_GB TYPE        STATUS
hadoop-w-1-pd us-central1-a 1500    pd-standard READY
NAME          ZONE          SIZE_GB TYPE        STATUS
hadoop-w-0-pd us-central1-a 1500    pd-standard READY
NAME       ZONE          MACHINE_TYPE  PREEMPTIBLE INTERNAL_IP   EXTERNAL_IP    STATUS
hadoop-w-0 us-central1-a n1-standard-2             10.240.90.191 173.255.112.33 RUNNING
NAME     ZONE          MACHINE_TYPE  PREEMPTIBLE INTERNAL_IP   EXTERNAL_IP     STATUS
hadoop-m us-central1-a n1-standard-2             10.240.152.66 130.211.160.182 RUNNING
NAME       ZONE          MACHINE_TYPE  PREEMPTIBLE INTERNAL_IP    EXTERNAL_IP    STATUS
hadoop-w-1 us-central1-a n1-standard-2             10.240.237.109 104.197.79.138 RUNNING

******************* gcloud compute stderr *******************
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
Created [https://www.googleapis.com/compute/v1/projects/hadoop-001-1071/zones/us-central1-a/disks/hadoop-m-pd].
Created [https://www.googleapis.com/compute/v1/projects/hadoop-001-1071/zones/us-central1-a/disks/hadoop-w-1-pd].
Created [https://www.googleapis.com/compute/v1/projects/hadoop-001-1071/zones/us-central1-a/disks/hadoop-w-0-pd].
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
WARNING: We noticed that you are using space-separated lists, which are deprecated. Please transition to using comma-separated lists instead (try "--disk name=hadoop-w-0-pd,mode=rw"). If you intend to use [mode=rw] as positional arguments, put the flags at the end.
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
WARNING: We noticed that you are using space-separated lists, which are deprecated. Please transition to using comma-separated lists instead (try "--disk name=hadoop-m-pd,mode=rw"). If you intend to use [mode=rw] as positional arguments, put the flags at the end.
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
WARNING: We noticed that you are using space-separated lists, which are deprecated. Please transition to using comma-separated lists instead (try "--disk name=hadoop-w-1-pd,mode=rw"). If you intend to use [mode=rw] as positional arguments, put the flags at the end.
Created [https://www.googleapis.com/compute/v1/projects/hadoop-001-1071/zones/us-central1-a/instances/hadoop-w-0].
Created [https://www.googleapis.com/compute/v1/projects/hadoop-001-1071/zones/us-central1-a/instances/hadoop-m].
Created [https://www.googleapis.com/compute/v1/projects/hadoop-001-1071/zones/us-central1-a/instances/hadoop-w-1].
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
Warning: Permanently added '104.197.79.138' (RSA) to the list of known hosts.^M
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).^M
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
Warning: Permanently added '173.255.112.33' (RSA) to the list of known hosts.^M
Warning: Permanently added '130.211.160.182' (RSA) to the list of known hosts.^M
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).^M
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).^M
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
/root/y/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
  import sets
Warning: Permanently added '104.197.79.138' (RSA) to the list of known hosts.^M
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).^M
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.

************ ERROR logs from gcloud compute stderr ************
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.

Can anyone help me to sort out the issue ?? Where it is going wrong ?? Please help me..

dennishuo commented 9 years ago

After that failure, what happens if you try gcloud compute ssh hadoop-m --zone us-central1-f?