delta-rho / install-emr

Installation script and instructions for setting up Tessera environment on Amazon Elastic MapReduce
10 stars 13 forks source link

Error in post-install-config.sh #19

Open aglsharad opened 8 years ago

aglsharad commented 8 years ago

I am trying to setup Tessera environment on Amazon EMR. I have setup the AWS account. I installed the scripts by cloning the github respository. Then I ran tessera-emr.sh from my system. Cluster is created. When I tried to logon to RStudio Server from web browser, credentials did not work. I used USER=tessera-user, PASSWD=tessera On investigating, I found customJAR step failed in the cluster.

Details in customJAR:

status: Failed ID:s-1FIU62D4Y3X95 JAR location: s3://elasticmapreduce/libs/script-runner/script-runner.jar Main class: None Arguments: s3:///scripts/post-install-config.sh tessera-user tessera Action on failure: Continue

Details from controller log files

2016-08-18T13:44:29.739Z INFO redirectOutput to /mnt/var/log/hadoop/steps/s-1FIU62D4Y3X95/stdout 2016-08-18T13:44:29.739Z INFO redirectError to /mnt/var/log/hadoop/steps/s-1FIU62D4Y3X95/stderr 2016-08-18T13:44:29.739Z INFO Working dir /mnt/var/lib/hadoop/steps/s-1FIU62D4Y3X95 2016-08-18T13:44:29.781Z INFO ProcessRunner started child process 19973 : hadoop 19973 2325 0 13:44 ? 00:00:00 bash /home/hadoop/bin/hadoop jar /mnt/var/lib/hadoop/steps/s-1FIU62D4Y3X95/script-runner.jar s3://dmar-emr/scripts/post-install-config.sh tessera-user tessera 2016-08-18T13:44:29.783Z INFO Synchronously wait child process to complete : hadoop jar /mnt/var/lib/hadoop/steps/s-1FIU62D4... 2016-08-18T13:44:35.785Z INFO waitProcessCompletion ended with exit code 1 : hadoop jar /mnt/var/lib/hadoop/steps/s-1FIU62D4... 2016-08-18T13:44:35.785Z INFO total process run time: 6 seconds 2016-08-18T13:44:35.984Z INFO Step created jobs: 2016-08-18T13:44:35.984Z WARN Step failed with exitCode 1 and took 6 seconds

aglsharad commented 8 years ago

On further investigation, I found below error in stderr. Could anyone please help.

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-2RDLYQ8DGFAFN/./post-install-config.sh" (in directory "."): error=2, No such file or directory at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:143) at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Cannot run program "/mnt/var/lib/hadoop/steps/s-2RDLYQ8DGFAFN/./post-install-config.sh" (in directory "."): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:96) ... 6 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 7 more

hafen commented 8 years ago

Interesting. These things are hard to debug when I can't replicate. Here's my controller log:

2016-08-17T23:11:44.944Z INFO startExec 'hadoop jar /mnt/var/lib/hadoop/steps/s-1QF5MGR5WOUEJ/script-runner.jar s3://tessera-emr/scripts/post-install-config.sh tessera-user tessera'
2016-08-17T23:11:44.947Z INFO Environment:
  TERM=linux
  HADOOP_PREFIX=/home/hadoop
  CONSOLETYPE=serial
  JAVA_HOME=/usr/java/latest
  PIG_CONF_DIR=/home/hadoop/pig/conf
  HBASE_HOME=/home/hadoop/hbase
  HADOOP_YARN_HOME=/home/hadoop
  HIVE_HOME=/home/hadoop/hive
  HADOOP_ROOT_LOGGER=INFO,DRFA
  YARN_HOME=/home/hadoop
  MAIL=/var/spool/mail/hadoop
  SPARK_CONF_DIR=/home/hadoop/spark/conf
  IMPALA_CONF_DIR=/home/hadoop/impala/conf
  PWD=/
  HOSTNAME=ip-172-31-9-91
  LESS_TERMCAP_mb=[01;31m
  LESS_TERMCAP_me=[0m
  NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
  LESS_TERMCAP_md=[01;38;5;208m
  AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
  HISTSIZE=1000
  HADOOP_COMMON_HOME=/home/hadoop
  HADOOP_LOGFILE=syslog
  PATH=/home/hadoop/pig/bin:/usr/local/cuda/bin:/usr/java/latest/bin:/home/hadoop/bin:/home/hadoop/mahout/bin:/home/hadoop/hive/bin:/home/hadoop/hbase/bin:/home/hadoop/impala/bin:/home/hadoop/spark/bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin:/home/hadoop/cascading/tools/multitool-20140224/bin:/home/hadoop/cascading/tools/load-20140223/bin:/home/hadoop/cascading/tools/lingual-client/bin:/home/hadoop/cascading/driven/bin
  HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-1QF5MGR5WOUEJ
  HIVE_CONF_DIR=/home/hadoop/hive/conf
  AWS_DEFAULT_REGION=us-west-2
  HADOOP_CONF_DIR=/home/hadoop/conf
  IMPALA_HOME=/home/hadoop/impala
  SHLVL=5
  HADOOP_IDENT_STRING=hadoop
  LANGSH_SOURCED=1
  XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
  AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
  UPSTART_JOB=rc
  SPARK_HOME=/home/hadoop/spark
  HADOOP_HOME_WARN_SUPPRESS=true
  EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
  AWS_RDS_HOME=/opt/aws/apitools/rds
  PIG_CLASSPATH=/home/hadoop/pig/lib
  LESS_TERMCAP_se=[0m
  MAHOUT_CONF_DIR=/home/hadoop/mahout/conf
  LOGNAME=hadoop
  UPSTART_INSTANCE=
  HBASE_CONF_DIR=/home/hadoop/hbase/conf
  YARN_CONF_DIR=/home/hadoop/conf
  AWS_PATH=/opt/aws
  _=/usr/java/latest/bin/java
  HADOOP_HOME=/home/hadoop
  runlevel=3
  LD_LIBRARY_PATH=/home/hadoop/lib/native:/usr/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib:
  UPSTART_EVENTS=runlevel
  MAHOUT_LOG_DIR=/mnt/var/log/apps
  previous=N
  EC2_HOME=/opt/aws/apitools/ec2
  PIG_HOME=/home/hadoop/pig
  LESS_TERMCAP_ue=[0m
  AWS_ELB_HOME=/opt/aws/apitools/elb
  RUNLEVEL=3
  USER=hadoop
  HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-1QF5MGR5WOUEJ/tmp
  RUBYOPT=rubygems
  PREVLEVEL=N
  HADOOP_HDFS_HOME=/home/hadoop
  HOME=/home/hadoop
  HISTCONTROL=ignoredups
  LESSOPEN=||/usr/bin/lesspipe.sh %s
  MAHOUT_HOME=/home/hadoop/mahout
  LANG=en_US.UTF-8
  LESS_TERMCAP_us=[04;38;5;111m
  HADOOP_MAPRED_HOME=/home/hadoop
2016-08-17T23:11:44.947Z INFO redirectOutput to /mnt/var/log/hadoop/steps/s-1QF5MGR5WOUEJ/stdout
2016-08-17T23:11:44.947Z INFO redirectError to /mnt/var/log/hadoop/steps/s-1QF5MGR5WOUEJ/stderr
2016-08-17T23:11:44.947Z INFO Working dir /mnt/var/lib/hadoop/steps/s-1QF5MGR5WOUEJ
2016-08-17T23:11:44.990Z INFO ProcessRunner started child process 20264 :
hadoop   20264  2408  0 23:11 ?        00:00:00 bash /home/hadoop/bin/hadoop jar /mnt/var/lib/hadoop/steps/s-1QF5MGR5WOUEJ/script-runner.jar s3://tessera-emr/scripts/post-install-config.sh tessera-user tessera
2016-08-17T23:11:44.992Z INFO Synchronously wait child process to complete : hadoop jar /mnt/var/lib/hadoop/steps/s-1QF5MGR5...
2016-08-17T23:12:00.994Z INFO waitProcessCompletion ended with exit code 0 : hadoop jar /mnt/var/lib/hadoop/steps/s-1QF5MGR5...
2016-08-17T23:12:00.994Z INFO total process run time: 16 seconds
2016-08-17T23:12:01.259Z INFO Step created jobs: 
2016-08-17T23:12:01.259Z INFO Step succeeded with exitCode 0 and took 16 seconds

And stderr:

Command exiting with ret '0'

And stdout:

Downloading 's3://tessera-emr/scripts/post-install-config.sh' to '/mnt/var/lib/hadoop/steps/s-1QF5MGR5WOUEJ/.'

If you can compare, maybe that will help narrow down the problem. Are you sure these scripts are being copied to/from your s3 bucket?

aglsharad commented 8 years ago

Yes, the scripts seems to be copied to/from s3 bucket as I can see the scripts in my S3 bucket: s3://dmar-emr My stdout is same as yours except the S3 bucket name and the ID

Downloading 's3://dmar-emr/scripts/post-install-config.sh' to '/mnt/var/lib/hadoop/steps/s-CTIH7VZHVAFZ/.'

My stderr is giving exception as shared earlier.

Controller has difference on following two I used AWS_DEFAULT_REGION=us-west-1

I got below exit code and below total process run time

2016-08-21T17:15:44.543Z INFO Synchronously wait child process to complete : hadoop jar /mnt/var/lib/hadoop/steps/s-CTIH7VZH... 2016-08-21T17:15:50.545Z INFO waitProcessCompletion ended with exit code 1 : hadoop jar /mnt/var/lib/hadoop/steps/s-CTIH7VZH... 2016-08-21T17:15:50.545Z INFO total process run time: 6 seconds 2016-08-21T17:15:50.736Z INFO Step created jobs: 2016-08-21T17:15:50.736Z WARN Step failed with exitCode 1 and took 6 seconds

Here is my syslog

2016-08-21 17:15:48,421 INFO com.amazon.ws.emr.hadoop.fs.EmrFileSystem (main): Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation 2016-08-21 17:15:48,796 INFO amazon.emr.metrics.MetricsSaver (main): MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1471799702904 2016-08-21 17:15:48,796 INFO amazon.emr.metrics.MetricsSaver (main): Created MetricsSaver j-1CW6B85J7SWH6:i-01c12c2ecaeef2d96:RunJar:20223 period:60 /mnt/var/em/raw/i-01c12c2ecaeef2d96_20160821_RunJar_20223_raw.bin 2016-08-21 17:15:49,666 INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem (main): Opening 's3://dmar-emr/scripts/post-install-config.sh' for reading 2016-08-21 17:15:49,727 INFO amazon.emr.metrics.MetricsSaver (main): Thread 1 created MetricsLockFreeSaver 1

I am always getting this error, 4th time now.