Closed Orlamh closed 4 years ago
Can you describe what you application does (or share the program if possible)? Your task is trying to access something? (at org.apache.spark.util.Utils$.fetchFile(Utils.scala:489)
):
[2020-07-07T18:38:57.1466103Z] [SEALWT20197] [Error] [JvmBridge] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.io.IOException: Failed to connect to SEALWT20197.amer.gettywan.com/192.168.0.106:65268
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$downloadClient(NettyRpcEnv.scala:368)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$openChannel$1.apply$mcV$sp(NettyRpcEnv.scala:336)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$openChannel$1.apply(NettyRpcEnv.scala:335)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$openChannel$1.apply(NettyRpcEnv.scala:335)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:339)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:665)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:489)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:755)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:747)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:747)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.NullPointerException
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1423)
... 17 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: Permission denied: no further information: mycomputername/myipaddress:65268
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.SocketException: Permission denied: no further information
... 11 more
I'm trying to run the tutorial code. https://dotnet.microsoft.com/learn/data/spark-tutorial/code. I have an input.txt file. It's in the root of the project and it's also in the bin/debug folder. I'm running my command line console as an admin
Thanks
Can you try to specify the full path for your input.txt? Or, you could try putting your input.txt in the same folder as dll, go to the directory, and run spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local microsoft-spark-2.3.x-0.12.1.jar dotnet poc_SparkApp.dll
I tried running from .\bin\debug\netcoreapp3.1, the same location as the dll and input.txt. Same problem. Looks like same stacktrace.
Can you share your poc_SparkApp.dll
?
or you can just zip up all the files under .\bin\debug\netcoreapp3.1
Zipped contents of netcoreapp3.1
Thanks, can you also share the version of Spark you are using?
Works fine for me with Spark 2.3.4, so prob. some issues on your side with networking, etc.:
C:\Users\terryk\Downloads\github_582\netcoreapp3.1>C:\spark\spark-2.3.4-bin-hadoop2.7\bin\spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local microsoft-spark-2.3.x-0.12.1.jar poc_SparkApp.exe
...
...
2020-07-07 14:00:21 INFO TaskSetManager:54 - Finished task 163.0 in stage 1.0 (TID 199) in 215 ms on localhost (executor driver) (199/200)
2020-07-07 14:00:21 INFO ShuffleBlockFetcherIterator:54 - Getting 1 non-empty blocks out of 1 blocks
2020-07-07 14:00:21 INFO ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 34 ms
2020-07-07 14:00:21 INFO Executor:54 - Finished task 165.0 in stage 1.0 (TID 200). 4304 bytes result sent to driver
2020-07-07 14:00:21 INFO TaskSetManager:54 - Finished task 165.0 in stage 1.0 (TID 200) in 151 ms on localhost (executor driver) (200/200)
2020-07-07 14:00:21 INFO TaskSchedulerImpl:54 - Removed TaskSet 1.0, whose tasks have all completed, from pool
2020-07-07 14:00:21 INFO DAGScheduler:54 - ResultStage 1 (showString at NativeMethodAccessorImpl.java:0) finished in 27.785 s
2020-07-07 14:00:21 INFO DAGScheduler:54 - Job 0 finished: showString at NativeMethodAccessorImpl.java:0, took 29.806009 s
2020-07-07 14:00:21 INFO CodeGenerator:54 - Code generated in 39.5636 ms
+------+-----+
| word|count|
+------+-----+
| .NET| 3|
|Apache| 2|
| app| 2|
| This| 2|
| Spark| 2|
| World| 1|
|counts| 1|
| for| 1|
| words| 1|
| with| 1|
| Hello| 1|
| uses| 1|
+------+-----+
2020-07-07 14:00:21 INFO AbstractConnector:318 - Stopped Spark@6f792804{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-07-07 14:00:21 INFO SparkUI:54 - Stopped Spark web UI at <redacted>
2020-07-07 14:00:21 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2020-07-07 14:00:21 INFO MemoryStore:54 - MemoryStore cleared
2020-07-07 14:00:21 INFO BlockManager:54 - BlockManager stopped
2020-07-07 14:00:21 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2020-07-07 14:00:21 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2.3.3
Hi @Orlamh were you able to get this issue resolved?
Hi, we are going to close this issue as it has been inactive for a while. Please feel free to re-open it if the issue persists and/or there are any new updates. Thank you!
I'm also facing the issue, I don't know why Spark is picking up Docker IP.
ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /10.0.75.1:57068
After giving above error, it's terminating.
I am getting same issue with K8 depoyment..
ERROR task-result-getter-0 RetryingBlockFetcher - Exception while beginning fetch of 1 outstanding blocks java.io.IOException: Failed to connect to /192.168.6.186:46443 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:122) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121) at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:143) at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:103) at org.apache.spark.storage.BlockManager.fetchRemoteManagedBuffer(BlockManager.scala:1010) at org.apache.spark.storage.BlockManager.$anonfun$getRemoteBlock$8(BlockManager.scala:954) at scala.Option.orElse(Option.scala:447) at org.apache.spark.storage.BlockManager.getRemoteBlock(BlockManager.scala:954) at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:1092) at org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(TaskResultGetter.scala:88) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934) at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: /192.168.6.186:46443 Caused by: java.net.ConnectException: Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
I'm also facing the issue, I don't know why Spark is picking up Docker IP.
ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks java.io.IOException: Failed to connect to /10.0.75.1:57068
After giving above error, it's terminating.
Did you find solution
Describe the bug Following the Getting Started documentation I try a spark-submit and get the following error
java.io.IOException: Failed to connect to mycomputername/myipaddress:64701
This is the command I used and I've attached the logs: log.txt
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin\Debug\netcoreapp3.1\microsoft-spark-2.3.x-0.12.1.jar dotnet bin\Debug\netcoreapp3.1\poc_SparkApp.dll