Qihoo360 / hbox

AI on Hadoop
Apache License 2.0
1.73k stars 384 forks source link

跑demo,出现java.lang.reflect.UndeclaredThrowableException #72

Closed wjlight closed 4 years ago

wjlight commented 4 years ago

r1.4分支,hadoop使用2.7.3,在跑example tensorflow的时候,出现:

java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy15.fetchApplicationMessages(Unknown Source) at net.qihoo.xlearning.client.Client.waitCompleted(Client.java:792) at net.qihoo.xlearning.client.Client.submitAndMonitor(Client.java:753) at net.qihoo.xlearning.client.Client.main(Client.java:821) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.EOFException: End of File Exception between local host is: "54ef4e3d2490/172.17.0.2"; destination host is: "54ef4e3d2490":37821; : java.io.EOFExcept ion; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243) ... 10 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1084) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)

看了http://wiki.apache.org/hadoop/EOFException 说是版本不对,但我检查了好几遍,hadoop版本和包都是一致的,都是2.7.3,节点也没有重启,可能会是什么原因呢?

wjlight commented 4 years ago

仔细看了nodemanger的log,发现是内存分配的不够,然后被杀掉了。解决办法:提交的添加am-memory 参数增加内存分配