delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.6k stars 1.71k forks source link

Delta doesn't work in windows 11 #1059

Closed Gaston-Ortiz closed 2 years ago

Gaston-Ortiz commented 2 years ago

Bug

Describe the problem

Hello, I am using spark 3.2.1 and hadoop 2.7 in windows 11. I am with problems to run the code in the quickstart guide https://docs.delta.io/latest/quick-start.html .

When I run this line in my command line:

pyspark --packages io.delta:delta-core_2.12:1.1.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

I am getting this error:

`Python 3.8.10 (default, May 19 2021, 13:12:57) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more information. :: loading settings :: url = jar:file:/C:/spark-3.2.1-bin-hadoop2.7/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml Ivy Default Cache set to: C:\Users\gasto.ivy2\cache The jars for the packages stored in: C:\Users\gasto.ivy2\jars io.delta#delta-core_2.12 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-d5018e10-2bd9-44dd-99a3-979f0ecaa9b4;1.0 confs: [default] found io.delta#delta-core_2.12;1.1.0 in central found org.antlr#antlr4-runtime;4.8 in central found org.codehaus.jackson#jackson-core-asl;1.9.13 in central :: resolution report :: resolve 477ms :: artifacts dl 15ms :: modules in use: io.delta#delta-core_2.12;1.1.0 from central in [default] org.antlr#antlr4-runtime;4.8 from central in [default] org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]

    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
    ---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent-d5018e10-2bd9-44dd-99a3-979f0ecaa9b4 confs: [default] 0 artifacts copied, 3 already retrieved (0kB/12ms) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/04/07 16:20:40 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 22/04/07 16:20:56 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped 22/04/07 16:20:58 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87) at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:78) at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:626) at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1009) at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:212) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2019) at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:534) at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:117) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more 22/04/07 16:20:58 ERROR Inbox: Ignoring error java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:534) at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:117) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 22/04/07 16:21:03 ERROR Utils: Aborting task java.io.IOException: Failed to connect to host.docker.internal/192.168.2.10:65421 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230) at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399) at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496) at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:762) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:549) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:962) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:954) at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985) at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:954) at org.apache.spark.executor.Executor.(Executor.scala:247) at org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:64) at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220) at org.apache.spark.SparkContext.(SparkContext.scala:581) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: host.docker.internal/192.168.2.10:65421 Caused by: java.net.ConnectException: Connection timed out: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 22/04/07 16:21:03 ERROR SparkContext: Error initializing SparkContext. java.io.IOException: Failed to connect to host.docker.internal/192.168.2.10:65421`

Expected results

How can I do it to solve this issue?

Further details

When I run pyspark without the config for delta is working fine.

zsxwing commented 2 years ago

This looks like a Spark issue in windows 11. The exception here doesn't show any Delta code. Could you raise a ticket for Spark instead?

Gaston-Ortiz commented 2 years ago

Hi @zsxwing , but when I use spark without the delta config spark works fine. image

zsxwing commented 2 years ago

@Gaston-Ortiz Could you try pyspark --packages org.apache.kafka:kafka-clients:2.8.1? I think this will trigger the same error. The stack trace looks like some code path in Spark package management doesn't work with Windows 11.

Gaston-Ortiz commented 2 years ago

I run this code and I got a similar error:

22/04/12 12:12:12 ERROR Inbox: Ignoring error java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:404) at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:97) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 22/04/12 12:12:12 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87) at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:66) at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:567) at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:934) at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:200) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934) at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:404) at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:97) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more 22/04/12 12:12:16 ERROR Utils: Aborting task java.io.IOException: Failed to connect to host.docker.internal/192.168.2.10:57071 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195) at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392) at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411) at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:359) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:719) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:535) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:888) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:879) at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:879) at org.apache.spark.executor.Executor.<init>(Executor.scala:228) at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64) at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201) at org.apache.spark.SparkContext.<init>(SparkContext.scala:562) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: host.docker.internal/192.168.2.10:57071 Caused by: java.net.ConnectException: Connection timed out: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 22/04/12 12:12:16 ERROR SparkContext: Error initializing SparkContext. java.io.IOException: Failed to connect to host.docker.internal/192.168.2.10:57071 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195) at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392) at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411) at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:359) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:719) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:535) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:888) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:879) at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:879) at org.apache.spark.executor.Executor.<init>(Executor.scala:228) at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64) at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201) at org.apache.spark.SparkContext.<init>(SparkContext.scala:562) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: host.docker.internal/192.168.2.10:57071 Caused by: java.net.ConnectException: Connection timed out: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 22/04/12 12:12:16 ERROR Utils: Uncaught exception in thread Thread-4 java.lang.NullPointerException at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:168) at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:734) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2171) at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1988) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357) at org.apache.spark.SparkContext.stop(SparkContext.scala:1988) at org.apache.spark.SparkContext.<init>(SparkContext.scala:648) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) 22/04/12 12:12:16 WARN MetricsSystem: Stopping a MetricsSystem that is not running 22/04/12 12:12:16 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) java.lang.reflect.Constructor.newInstance(Constructor.java:423) py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) py4j.Gateway.invoke(Gateway.java:238) py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) py4j.GatewayConnection.run(GatewayConnection.java:238) java.lang.Thread.run(Thread.java:748) 22/04/12 12:12:22 WARN NettyRpcEnv: Ignored failure: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2d1437da rejected from java.util.concurrent.ScheduledThreadPoolExecutor@1e1881c5[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] 22/04/12 12:12:22 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:930) at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:200) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934) at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:144) at org.apache.spark.rpc.netty.NettyRpcEnv.askAbortable(NettyRpcEnv.scala:242) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.askAbortable(NettyRpcEnv.scala:548) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:552) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:102) ... 12 more 22/04/12 12:12:32 WARN NettyRpcEnv: Ignored failure: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@61b3f92e rejected from java.util.concurrent.ScheduledThreadPoolExecutor@1e1881c5[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] 22/04/12 12:12:32 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:930) at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:200) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934) at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:144) at org.apache.spark.rpc.netty.NettyRpcEnv.askAbortable(NettyRpcEnv.scala:242) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.askAbortable(NettyRpcEnv.scala:548) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:552) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:102) ... 12 more 22/04/12 12:12:35 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87) at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:66) at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:567) at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:934) at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:200) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934) at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:404) at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:97) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more

Gaston-Ortiz commented 2 years ago

Do you know how to solve it?

zsxwing commented 2 years ago

I don't have a Windows 11 environment to try this and dig further. You can report an issue in https://issues.apache.org/jira/browse/SPARK

Gaston-Ortiz commented 2 years ago

Thank you very much! I will close the comment

saitejagundabathula commented 1 year ago

Do we have any resolution for this, i'm facing the same issue. Thank you!

saitejagundabathula commented 1 year ago

@zsxwing, @Gaston-Ortiz, Can you please share if there is any resolution for this error. Thanks!

zsxwing commented 1 year ago

@saitejagundabathula one workaround is you can download Delta's jars from maven and put them into the Spark's jars directory. Then you don't need to use --packages io.delta:delta-core_2.12:0.8.0 in your command.

zsxwing commented 1 year ago

By the way, I just created a JIRA ticket in Spark: https://issues.apache.org/jira/browse/SPARK-41885

bjornjorgensen commented 1 year ago

"ERROR Utils: Aborting task java.io.IOException: Failed to connect to host.docker.internal/192.168.2.10:65421 at "

I dont have windows 11, but whats running in a docker container her?

saitejagundabathula commented 1 year ago

@saitejagundabathula one workaround is you can download Delta's jars from maven and put them into the Spark's jars directory. Then you don't need to use --packages io.delta:delta-core_2.12:0.8.0 in your command.

Thanks for your suggestion.. Now i'm getting a different error reltated to delta path: Code: https://github.com/delta-io/delta/blob/master/examples/python/quickstart.py. spark version: 3.0.3 is it related to it?

########### Upsert new data #############
Traceback (most recent call last):
  File "quickstart.py", line 32, in <module>
    deltaTable = DeltaTable.forPath(spark, "/tmp/delta-table")
  File "\AppData\Local\Programs\Python\Python37\lib\site-packages\delta\tables.py", line 387, in forPath
    jdt = jvm.io.delta.tables.DeltaTable.forPath(jsparkSession, path, hadoopConf)
  File "spark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py", line 1305, in __call__
  File "spark\python\pyspark\sql\utils.py", line 128, in deco
    return f(*a, **kw)
  File "spark\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath. Trace:
py4j.Py4JException: Method forPath([class org.apache.spark.sql.SparkSession, class java.lang.String, class java.util.HashMap]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
        at py4j.Gateway.invoke(Gateway.java:276)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)
zsxwing commented 1 year ago

I think you installed delta-spark (the python package) but it's not the version you specify in --packages io.delta:delta-core_2.12:0.8.0. You can uninstall it.

saitejagundabathula commented 1 year ago

I think you installed delta-spark (the python package) but it's not the version you specify in --packages io.delta:delta-core_2.12:0.8.0. You can uninstall it.

Thanks! but it unable to find delta.tables module

Error: Command: spark-submit quickstart.py

Traceback (most recent call last):
  File "quick_start.py", line 3, in <module>
    from delta.tables import DeltaTable
ModuleNotFoundError: No module named 'delta.tables'
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

When I trigger this with command spark-submit --packages io.delta:delta-core_2.12:0.8.0 quickstart.py the error related to BlockManagerMasterEndpoint is thorwn.

Looks with out mentioning the delta package it doesn't seems to work.

zsxwing commented 1 year ago

I'm wondering if you can use a newer Spark version (3.1.x or above) that supports Delta Lake 1.0.0+? There is no delta-spark python package for 0.8.0. If you can use a Delta Lake version X that has delta-spark python package, you can put the delta jar into Spark's jars folder, and install delta-spark using pip install delta-spark==X.

saitejagundabathula commented 1 year ago

Thanks a lot for your support! I could able to run now!

bjtho08 commented 1 year ago

By the way, I just created a JIRA ticket in Spark: https://issues.apache.org/jira/browse/SPARK-41885

I have observed this exact issue on Windows 10, so it appears to be a general Windows bug rather than a specific Windows 11 bug.

zavoraad commented 1 year ago

Also seeing this on a MacOS as well. As long as io.delta:delta-core_2.12 is specified pyspark fails to load with

Main class:
org.apache.spark.api.python.PythonGatewayServer
Arguments:

Spark config:
(spark.app.name,PySparkShell)
(spark.app.submitTime,1679948647848)
(spark.driver.bindAddress,127.0.0.1)
(spark.files,file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar,file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-storage-2.2.0.jar,file:///Users/aaron.zavora/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar)
(spark.jars,file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar,file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-storage-2.2.0.jar,file:///Users/aaron.zavora/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar)
(spark.master,local[*])
(spark.repl.local.jars,file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar,file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-storage-2.2.0.jar,file:///Users/aaron.zavora/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar)
(spark.submit.deployMode,client)
(spark.submit.pyFiles,/Users/aaron.zavora/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar,/Users/aaron.zavora/.ivy2/jars/io.delta_delta-storage-2.2.0.jar,/Users/aaron.zavora/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar)
(spark.ui.showConsoleProgress,true)
Classpath elements:
file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar
file:///Users/aaron.zavora/.ivy2/jars/io.delta_delta-storage-2.2.0.jar
file:///Users/aaron.zavora/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/03/27 16:24:09 ERROR Utils: Aborting task
java.io.IOException: Failed to connect to /198.105.254.26:59638
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
avinash-03 commented 1 year ago

@saitejagundabathula one workaround is you can download Delta's jars from maven and put them into the Spark's jars directory. Then you don't need to use --packages io.delta:delta-core_2.12:0.8.0 in your command.

thanks this worked for me