Qihoo360 / hbox

AI on Hadoop
Apache License 2.0
1.73k stars 385 forks source link

ERROR Client: Application run failed! #10

Closed xlsong19 closed 5 months ago

xlsong19 commented 6 years ago

hi,I have pull the latest code and add the "--queue default" at the end of the "run.sh" first.The run infomation is 17/12/13 06:41:52 INFO Client: Copying /opt/XLearning/target/xlearning-1.0/lib/xlearning-1.0-hadoop2.7.3.jar to remote path hdfs://test-2:8020/tmp/XLearning/staging/application_1511938500942_0009/AppMaster.jar 17/12/13 06:41:52 INFO Client: Building environments for the application master 17/12/13 06:41:52 INFO Client: Copy xlearning files from local filesystem to remote. 17/12/13 06:41:52 INFO Client: Copying demo.py to remote path hdfs://test-2:8020/tmp/XLearning/staging/application_1511938500942_0009/demo.py 17/12/13 06:41:52 INFO Client: Copying dataDeal.py to remote path hdfs://test-2:8020/tmp/XLearning/staging/application_1511938500942_0009/dataDeal.py 17/12/13 06:41:52 INFO Client: Building application master launch command 17/12/13 06:41:52 INFO Client: Application master launch command: ${JAVA_HOME}/bin/java -Xms1024m -Xmx1024m net.qihoo.xlearning.AM.ApplicationMaster 1>/stdout 2>/stderr 17/12/13 06:41:52 INFO Client: Submitting application to ResourceManager 17/12/13 06:41:53 INFO YarnClientImpl: Submitted application application_1511938500942_0009 17/12/13 06:41:53 INFO Client: Application submitAndMonitor succeed 17/12/13 06:41:53 INFO Client: The url to track the job: http://test-2:8088/proxy/application_1511938500942_0009/ 17/12/13 06:41:53 INFO Client: Application report for application_1511938500942_0009 (state: ACCEPTED) 17/12/13 06:41:54 INFO Client: Application report for application_1511938500942_0009 (state: ACCEPTED) 17/12/13 06:41:55 INFO Client: Application report for application_1511938500942_0009 (state: ACCEPTED) 17/12/13 06:41:56 INFO Client: Application report for application_1511938500942_0009 (state: ACCEPTED) 17/12/13 06:41:57 INFO Client: Application report for application_1511938500942_0009 (state: FAILED) 17/12/13 06:41:57 INFO Client: Application has completed with YarnApplicationState=FAILED and FinalApplicationStatus=FAILED 17/12/13 06:41:57 ERROR Client: Application run failed!

I view the log under $XLEARNING_HOME/logs files ,the error is Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hadoop/mapreduce/jhs/mr-jhs-state/LOCK: Resource temporarily unavailable,which is about IO error. More Information is follows:

17/12/13 06:56:03 INFO MetricsSystemImpl: Stopping JobHistoryServer metrics system... 17/12/13 06:56:03 INFO MetricsSystemImpl: JobHistoryServer metrics system stopped. 17/12/13 06:56:03 INFO MetricsSystemImpl: JobHistoryServer metrics system shutdown complete. 17/12/13 06:56:03 FATAL JobHistoryServer: Error starting JobHistoryServer org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hadoop/mapreduce/jhs/mr-jhs-state/LOCK: Resource temporarily unavailable at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at net.qihoo.xlearning.jobhistory.JobHistoryServer.serviceStart(JobHistoryServer.java:218) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at net.qihoo.xlearning.jobhistory.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:250) at net.qihoo.xlearning.jobhistory.JobHistoryServer.main(JobHistoryServer.java:259) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hadoop/mapreduce/jhs/mr-jhs-state/LOCK: Resource temporarily unavailable at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.mapreduce.v2.hs.HistoryServerLeveldbStateStoreService.startStorage(HistoryServerLeveldbStateStoreService.java:82) at org.apache.hadoop.mapreduce.v2.hs.HistoryServerStateStoreService.serviceStart(HistoryServerStateStoreService.java:79) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) ... 5 more 17/12/13 06:56:03 INFO ExitUtil: Exiting with status -1 17/12/13 06:56:03 INFO JobHistoryServer: SHUTDOWN_MSG: /**** SHUTDOWN_MSG: Shutting down JobHistoryServer at test-2/172.16.12.46 ****/ Do you have any ideas about this?How can I solve it?Thank you~

jiarunying commented 6 years ago

$XLEARNING_HOME/logs files is for the jobhistory start service, not related to the application. Please see the RM log or AM error log information at the local hadoop log dir or web Interface.

decm32 commented 6 years ago

I met the same problem either. qq 20180310014255