XiaoMi / minos

Minos is beyond a hadoop deployment system.
Apache License 2.0
522 stars 200 forks source link

yarn deployed by minos cannot run mapreduce #13

Closed hpttlook closed 10 years ago

hpttlook commented 10 years ago

try to run wordcount from mapreduce-example-xx.jar, i got the following error:

2014-01-03,11:25:47,793 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1388719502388_0001 failed 2 times due to AM Container for appattempt_1388719502388_0001_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)

@wuzesheng
I guess it is something related to some un-set env var. maybe it is the %service_env as i mentiond in issue 8 which causes the error.

can u give me some advices for setting this variable?

wuzesheng commented 10 years ago

Can you post the content of the stderr log?

hpttlook commented 10 years ago

i cannot find any log from the web-adress of the yarn-cluster. even no application can be found in cluster web front end(may be some default config is wrong with yarn?). but accroding to the yarn.log of resource manager, i found this error is caused by: cannot find main class xxxx org.apache.xxxx.MRAppMaster i use the yarn.nodemanager.delete.debug-delay-sec to preserve the container launch script which have the following lines: export CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMONHOME/share/hadoop/common/:$HADOOP_COMMONHOME/share/hadoop/common/lib/:$HADOOP_HDFSHOME/share/hadoop/hdfs/:$HADOOP_HDFSHOME/share/hadoop/hdfs/lib/:$HADOOP_YARNHOME/share/hadoop/yarn/:$HADOOP_YARNHOME/share/hadoop/yarn/lib/:$HADOOP_MAPREDHOME/share/hadoop/mapreduce/:$HADOOP_MAPREDHOME/share/hadoop/mapreduce/lib/:job.jar/job.jar:job.jar/classes/:job.jar/lib/:$PWD/" export HADOOP_TOKEN_FILE_LOCATION="/home/work/data/dev/disk0/yarn/godtst-zeus/nodemanager/usercache/work/appcache/application_1388730055769_0001/container_1388730055769_0001_02_000001/container_tokens"

i guess in yarn deployed by minos , the following var are not set correctly: HADOOP_COMMON_HOME HADOOP_YARN_HOME HADOOP_MAPRED_HOME can u take a look of the %service_env in xiaomi cluster?

wuzesheng commented 10 years ago

Does your start.sh under rm expand the variable %service_env correctly?

hpttlook commented 10 years ago

OK. I got it. In deploy_yarn, the serviceenv var is usefull. in other deploy{hbase\zookeeper\hdfs}, it's useless.

when i deploy the hdfs ,and check the package , i found the service_env cause error(not a big deal), i delete it from template which cause all the following error in yarn.

thanks very much!

hpttlook commented 10 years ago

i think it would be better if we substitute t he service_env to emptyline in other service(hbase\hdfs).

wuzesheng commented 10 years ago

Got it. We will give a default action to the substituting of the service_env variable.

wuzesheng commented 10 years ago

I will close this issue and create a new issue to track the above minor improvements, thanks @hpttlook