Open zxsimple opened 5 years ago
you should use spark-submit option:--principal, --keytab while use SONA.
No matter I specify --principal
and --keytab
option or spark.yarn.keytab
and spark.yarn.principal
configuration, I will get Connection Refused Exception. Please note kinit
command works fine.
Exception while invoking getNewApplication of class ApplicationClientProtocolPBClientImpl over 141 after 1 fail over attempts. Trying to fail over after sleeping for 30266ms. | org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:146)
java.net.ConnectException: Call From host-xxx/xxx to host-yyy:26004 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:815)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:746)
at org.apache.hadoop.ipc.Client.call(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy27.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:231)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:202)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy28.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:227)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:235)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:140)
at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115)
at com.tencent.angel.spark.context.AngelPSContext$.launchAngel(AngelPSContext.scala:301)
at com.tencent.angel.spark.context.AngelPSContext$.apply(AngelPSContext.scala:265)
at com.tencent.angel.spark.context.PSContext$.liftedTree1$1(PSContext.scala:85)
at com.tencent.angel.spark.context.PSContext$.instance(PSContext.scala:83)
at com.tencent.angel.spark.context.PSContext$.getOrCreate(PSContext.scala:67)
at com.tencent.angel.spark.examples.basic.LR$.main(LR.scala:43)
at com.tencent.angel.spark.examples.basic.LR.main(LR.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:650)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:658)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:763)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:394)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1577)
at org.apache.hadoop.ipc.Client.call(Client.java:1499)
... 27 more
Please paste more details logs, can you submit a spark example job without angel.
I can submit spark example with or without kerberos authentication.
spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-client \
--keytab [my_keytab] \
--principal [my_name] \
--num-executors 4 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
$SPARK_HOME/examples/jars/spark-examples_2.11-2.1.0.jar 10
That's exception indicates authentication fails with the keytab and user. The same exception just repeated after several seconds.
2019-07-02 20:01:18,221 | INFO | [dispatcher-event-loop-14] | Registered executor NettyRpcEndpointRef(null) (xxxxxx:52720) with ID 10 | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2019-07-02 20:01:18,272 | INFO | [dispatcher-event-loop-2] | Registering block manager host-xxxxx:22744 with 2004.6 MB RAM, BlockManagerId(10,xxxxx, None) | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2019-07-02 20:01:18,498 | INFO | [dispatcher-event-loop-13] | Registered executor NettyRpcEndpointRef(null) (xxxxxx:24084) with ID 8 | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2019-07-02 20:01:18,539 | INFO | [dispatcher-event-loop-10] | Registering block manager host-xxxxx:22614 with 2004.6 MB RAM, BlockManagerId(8, xxxxxx, 22614, None) | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
2019-07-02 20:01:41,370 | INFO | [Driver] | Failing over to 140 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.performFailover(ConfiguredRMFailoverProxyProvider.java:100)
2019-07-02 20:01:41,372 | WARN | [Driver] | Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] | org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:726)
2019-07-02 20:01:41,372 | INFO | [Driver] | Exception while invoking getNewApplication of class ApplicationClientProtocolPBClientImpl over 140 after 2 fail over attempts. Trying to fail over after sleeping for 39296ms. | org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:146)
java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "xxxx/xxxxx"; destination host is: "host-xxxxx":26004;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:796)
at org.apache.hadoop.ipc.Client.call(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy21.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:231)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:202)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy22.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:227)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:235)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:130)
at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115)
at com.tencent.angel.spark.context.AngelPSContext$.launchAngel(AngelPSContext.scala:301)
at com.tencent.angel.spark.context.AngelPSContext$.apply(AngelPSContext.scala:265)
at com.tencent.angel.spark.context.PSContext$.liftedTree1$1(PSContext.scala:85)
at com.tencent.angel.spark.context.PSContext$.instance(PSContext.scala:83)
at com.tencent.angel.spark.context.PSContext$.getOrCreate(PSContext.scala:67)
at com.tencent.angel.spark.examples.basic.LR$.main(LR.scala:43)
at com.tencent.angel.spark.examples.basic.LR.main(LR.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:650)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:731)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1778)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:694)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:784)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:394)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1577)
at org.apache.hadoop.ipc.Client.call(Client.java:1499)
... 27 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:177)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:404)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:581)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:394)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:776)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:772)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1778)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:771)
... 30 more
2019-07-02 20:02:20,669 | INFO | [Driver] | Failing over to 141 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.performFailover(ConfiguredRMFailoverProxyProvider.java:100)
2019-07-02 20:02:20,671 | INFO | [Driver] | Exception while invoking getNewApplication of class ApplicationClientProtocolPBClientImpl over 141 after 3 fail over attempts. Trying to fail over after sleeping for 39585ms. | org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:146)
java.net.ConnectException: Call From host-xxxx/xxxxx toxxxx:26004 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:815)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:746)
at org.apache.hadoop.ipc.Client.call(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy21.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:231)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:202)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy22.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:227)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:235)
at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:130)
at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115)
at com.tencent.angel.spark.context.AngelPSContext$.launchAngel(AngelPSContext.scala:301)
at com.tencent.angel.spark.context.AngelPSContext$.apply(AngelPSContext.scala:265)
at com.tencent.angel.spark.context.PSContext$.liftedTree1$1(PSContext.scala:85)
at com.tencent.angel.spark.context.PSContext$.instance(PSContext.scala:83)
at com.tencent.angel.spark.context.PSContext$.getOrCreate(PSContext.scala:67)
at com.tencent.angel.spark.examples.basic.LR$.main(LR.scala:43)
at com.tencent.angel.spark.examples.basic.LR.main(LR.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:650)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:658)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:763)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:394)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1577)
at org.apache.hadoop.ipc.Client.call(Client.java:1499)
... 27 more
Did you specify the keytab file as client local file or submitted it with --file option?
keytab file is client local file(local path), where is your keytab file.
Yes, it is on local.
I can submit spark example with or without kerberos authentication.
spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn-client \ --keytab [my_keytab] \ --principal [my_name] \ --num-executors 4 \ --driver-memory 512m \ --executor-memory 512m \ --executor-cores 1 \ $SPARK_HOME/examples/jars/spark-examples_2.11-2.1.0.jar 10
That's exception indicates authentication fails with the keytab and user. The same exception just repeated after several seconds.
2019-07-02 20:01:18,221 | INFO | [dispatcher-event-loop-14] | Registered executor NettyRpcEndpointRef(null) (xxxxxx:52720) with ID 10 | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) 2019-07-02 20:01:18,272 | INFO | [dispatcher-event-loop-2] | Registering block manager host-xxxxx:22744 with 2004.6 MB RAM, BlockManagerId(10,xxxxx, None) | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) 2019-07-02 20:01:18,498 | INFO | [dispatcher-event-loop-13] | Registered executor NettyRpcEndpointRef(null) (xxxxxx:24084) with ID 8 | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) 2019-07-02 20:01:18,539 | INFO | [dispatcher-event-loop-10] | Registering block manager host-xxxxx:22614 with 2004.6 MB RAM, BlockManagerId(8, xxxxxx, 22614, None) | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) 2019-07-02 20:01:41,370 | INFO | [Driver] | Failing over to 140 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.performFailover(ConfiguredRMFailoverProxyProvider.java:100) 2019-07-02 20:01:41,372 | WARN | [Driver] | Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] | org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:726) 2019-07-02 20:01:41,372 | INFO | [Driver] | Exception while invoking getNewApplication of class ApplicationClientProtocolPBClientImpl over 140 after 2 fail over attempts. Trying to fail over after sleeping for 39296ms. | org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:146) java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "xxxx/xxxxx"; destination host is: "host-xxxxx":26004; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:796) at org.apache.hadoop.ipc.Client.call(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1460) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy21.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:231) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:202) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy22.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:227) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:235) at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:130) at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115) at com.tencent.angel.spark.context.AngelPSContext$.launchAngel(AngelPSContext.scala:301) at com.tencent.angel.spark.context.AngelPSContext$.apply(AngelPSContext.scala:265) at com.tencent.angel.spark.context.PSContext$.liftedTree1$1(PSContext.scala:85) at com.tencent.angel.spark.context.PSContext$.instance(PSContext.scala:83) at com.tencent.angel.spark.context.PSContext$.getOrCreate(PSContext.scala:67) at com.tencent.angel.spark.examples.basic.LR$.main(LR.scala:43) at com.tencent.angel.spark.examples.basic.LR.main(LR.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:650) Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:731) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1778) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:694) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:784) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:394) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1577) at org.apache.hadoop.ipc.Client.call(Client.java:1499) ... 27 more Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:177) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:404) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:581) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:394) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:776) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:772) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1778) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:771) ... 30 more 2019-07-02 20:02:20,669 | INFO | [Driver] | Failing over to 141 | org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.performFailover(ConfiguredRMFailoverProxyProvider.java:100) 2019-07-02 20:02:20,671 | INFO | [Driver] | Exception while invoking getNewApplication of class ApplicationClientProtocolPBClientImpl over 141 after 3 fail over attempts. Trying to fail over after sleeping for 39585ms. | org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:146) java.net.ConnectException: Call From host-xxxx/xxxxx toxxxx:26004 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:815) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:746) at org.apache.hadoop.ipc.Client.call(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1460) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy21.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:231) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:202) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy22.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:227) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:235) at com.tencent.angel.client.yarn.AngelYarnClient.startPSServer(AngelYarnClient.java:130) at com.tencent.angel.client.AngelPSClient.startPS(AngelPSClient.java:115) at com.tencent.angel.spark.context.AngelPSContext$.launchAngel(AngelPSContext.scala:301) at com.tencent.angel.spark.context.AngelPSContext$.apply(AngelPSContext.scala:265) at com.tencent.angel.spark.context.PSContext$.liftedTree1$1(PSContext.scala:85) at com.tencent.angel.spark.context.PSContext$.instance(PSContext.scala:83) at com.tencent.angel.spark.context.PSContext$.getOrCreate(PSContext.scala:67) at com.tencent.angel.spark.examples.basic.LR$.main(LR.scala:43) at com.tencent.angel.spark.examples.basic.LR.main(LR.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:650) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:658) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:763) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:394) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1577) at org.apache.hadoop.ipc.Client.call(Client.java:1499) ... 27 more
it means submit spark example failed? is your cluster in normal running?
@ouyangwen-it
How kerberos properties
angel.kerberos.keytab
andangel.kerberos.principal
used in SONA?