Qihoo360 / hbox

AI on Hadoop
Apache License 2.0
1.73k stars 385 forks source link

FATAL Client: Error running Client #9

Closed xlsong19 closed 6 years ago

xlsong19 commented 6 years ago

hi,I follow your steps ,when I run the $XLEARNING_HOME/bin/xl-submit --app-type "tensorflow" --app-name "tf-demo" --input /tmp/data/tensorflow#data --output /tmp/tensorflow_model#model --files demo.py,dataDeal.py --launch-cmd "python demo.py --data_path=./data --save_path=./model --log_dir=./eventLog --training_epochs=10" --worker-memory 2G --worker-num 2 --worker-cores 3 --ps-memory 1G --ps-num 1 command,it failed,the error information is

17/12/13 02:57:28 INFO Client: Submitting application to ResourceManager 17/12/13 02:57:28 FATAL Client: Error running Client java.lang.RuntimeException: Application submitAndMonitor failed! at net.qihoo.xlearning.client.Client.submitAndMonitor(Client.java:594) at net.qihoo.xlearning.client.Client.main(Client.java:665) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) How I can solve this problem?Can you help me ? Thanks~

jiarunying commented 6 years ago

Please pull the latest code, that can print the detail error information. Or you can try add the "--queue default" at the end of the "run.sh" first.