dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
https://dot.net/spark
MIT License
2.03k stars 315 forks source link

[BUG]: "failed to connect to" error using Get Started documentation #582

Closed Orlamh closed 4 years ago

Orlamh commented 4 years ago

Describe the bug Following the Getting Started documentation I try a spark-submit and get the following error java.io.IOException: Failed to connect to mycomputername/myipaddress:64701

This is the command I used and I've attached the logs: log.txt

spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin\Debug\netcoreapp3.1\microsoft-spark-2.3.x-0.12.1.jar dotnet bin\Debug\netcoreapp3.1\poc_SparkApp.dll

imback82 commented 4 years ago

Can you describe what you application does (or share the program if possible)? Your task is trying to access something? (at org.apache.spark.util.Utils$.fetchFile(Utils.scala:489)):

[2020-07-07T18:38:57.1466103Z] [SEALWT20197] [Error] [JvmBridge] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.io.IOException: Failed to connect to SEALWT20197.amer.gettywan.com/192.168.0.106:65268
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$downloadClient(NettyRpcEnv.scala:368)
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$openChannel$1.apply$mcV$sp(NettyRpcEnv.scala:336)
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$openChannel$1.apply(NettyRpcEnv.scala:335)
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$openChannel$1.apply(NettyRpcEnv.scala:335)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
    at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:339)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:665)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:489)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:755)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:747)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:747)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.lang.NullPointerException
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1423)
        ... 17 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedSocketException: Permission denied: no further information: mycomputername/myipaddress:65268
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    ... 1 more
Caused by: java.net.SocketException: Permission denied: no further information
    ... 11 more
Orlamh commented 4 years ago

I'm trying to run the tutorial code. https://dotnet.microsoft.com/learn/data/spark-tutorial/code. I have an input.txt file. It's in the root of the project and it's also in the bin/debug folder. I'm running my command line console as an admin

Thanks

imback82 commented 4 years ago

Can you try to specify the full path for your input.txt? Or, you could try putting your input.txt in the same folder as dll, go to the directory, and run spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local microsoft-spark-2.3.x-0.12.1.jar dotnet poc_SparkApp.dll

Orlamh commented 4 years ago

I tried running from .\bin\debug\netcoreapp3.1, the same location as the dll and input.txt. Same problem. Looks like same stacktrace.

imback82 commented 4 years ago

Can you share your poc_SparkApp.dll?

imback82 commented 4 years ago

or you can just zip up all the files under .\bin\debug\netcoreapp3.1

Orlamh commented 4 years ago

netcoreapp3.1.zip

Zipped contents of netcoreapp3.1

imback82 commented 4 years ago

Thanks, can you also share the version of Spark you are using?

imback82 commented 4 years ago

Works fine for me with Spark 2.3.4, so prob. some issues on your side with networking, etc.:

C:\Users\terryk\Downloads\github_582\netcoreapp3.1>C:\spark\spark-2.3.4-bin-hadoop2.7\bin\spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local microsoft-spark-2.3.x-0.12.1.jar poc_SparkApp.exe
...
...
2020-07-07 14:00:21 INFO  TaskSetManager:54 - Finished task 163.0 in stage 1.0 (TID 199) in 215 ms on localhost (executor driver) (199/200)
2020-07-07 14:00:21 INFO  ShuffleBlockFetcherIterator:54 - Getting 1 non-empty blocks out of 1 blocks
2020-07-07 14:00:21 INFO  ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 34 ms
2020-07-07 14:00:21 INFO  Executor:54 - Finished task 165.0 in stage 1.0 (TID 200). 4304 bytes result sent to driver
2020-07-07 14:00:21 INFO  TaskSetManager:54 - Finished task 165.0 in stage 1.0 (TID 200) in 151 ms on localhost (executor driver) (200/200)
2020-07-07 14:00:21 INFO  TaskSchedulerImpl:54 - Removed TaskSet 1.0, whose tasks have all completed, from pool
2020-07-07 14:00:21 INFO  DAGScheduler:54 - ResultStage 1 (showString at NativeMethodAccessorImpl.java:0) finished in 27.785 s
2020-07-07 14:00:21 INFO  DAGScheduler:54 - Job 0 finished: showString at NativeMethodAccessorImpl.java:0, took 29.806009 s
2020-07-07 14:00:21 INFO  CodeGenerator:54 - Code generated in 39.5636 ms
+------+-----+
|  word|count|
+------+-----+
|  .NET|    3|
|Apache|    2|
|   app|    2|
|  This|    2|
| Spark|    2|
| World|    1|
|counts|    1|
|   for|    1|
| words|    1|
|  with|    1|
| Hello|    1|
|  uses|    1|
+------+-----+

2020-07-07 14:00:21 INFO  AbstractConnector:318 - Stopped Spark@6f792804{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2020-07-07 14:00:21 INFO  SparkUI:54 - Stopped Spark web UI at <redacted>
2020-07-07 14:00:21 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2020-07-07 14:00:21 INFO  MemoryStore:54 - MemoryStore cleared
2020-07-07 14:00:21 INFO  BlockManager:54 - BlockManager stopped
2020-07-07 14:00:21 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2020-07-07 14:00:21 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
Orlamh commented 4 years ago

2.3.3 image

Niharikadutta commented 4 years ago

Hi @Orlamh were you able to get this issue resolved?

Niharikadutta commented 4 years ago

Hi, we are going to close this issue as it has been inactive for a while. Please feel free to re-open it if the issue persists and/or there are any new updates. Thank you!

shekharmayank commented 3 years ago

I'm also facing the issue, I don't know why Spark is picking up Docker IP.

ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /10.0.75.1:57068

After giving above error, it's terminating.

rogercallster commented 3 years ago

I am getting same issue with K8 depoyment..

ERROR task-result-getter-0 RetryingBlockFetcher - Exception while beginning fetch of 1 outstanding blocks java.io.IOException: Failed to connect to /192.168.6.186:46443 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:122) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121) at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:143) at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:103) at org.apache.spark.storage.BlockManager.fetchRemoteManagedBuffer(BlockManager.scala:1010) at org.apache.spark.storage.BlockManager.$anonfun$getRemoteBlock$8(BlockManager.scala:954) at scala.Option.orElse(Option.scala:447) at org.apache.spark.storage.BlockManager.getRemoteBlock(BlockManager.scala:954) at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:1092) at org.apache.spark.scheduler.TaskResultGetter$$anon$3.$anonfun$run$1(TaskResultGetter.scala:88) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934) at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: /192.168.6.186:46443 Caused by: java.net.ConnectException: Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

rogercallster commented 3 years ago

I'm also facing the issue, I don't know why Spark is picking up Docker IP.

ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /10.0.75.1:57068

After giving above error, it's terminating.

Did you find solution