dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
https://dot.net/spark
MIT License
2.02k stars 312 forks source link

[BUG]: spark.Dispose() and spark.Stop() throw error #964

Open akshatb1 opened 3 years ago

akshatb1 commented 3 years ago

Describe the bug spark.Dispose() and spark.Stop() does not work and throw error.

To Reproduce

Steps to reproduce the behavior:

  1. Go to Azure Synapse Studio
  2. Start Spark .NET Session.
  3. Try to execute spark.Dispose() or spark.Stop()
  4. Session does not stop and throws below error. If you think the bug depends on external factors (e.g., dataset), please provide us with a minimal reproducible example that consists of the following items:

Expected behavior Spark context should be stopped correctly.

Screenshots image

Desktop (please complete the following information):

Additional context Error message: ``` [2021-08-27T15:15:18.6083607Z] [07759f195a294a98b5ac3fc4a7e8522800554b46593] [Error] [JvmBridge] JVM method execution failed: Nonstatic method 'addFile' failed for class '50' when called with 2 arguments ([Index=1, Type=String, Value=/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1630075737145_0002/container_1630075737145_0002_01_000001/.sparkdotnet/.dotnetinteractive/mqi1almc.zjx/4744a673-f62f-4fd9-bc6e-5c84ec6d4f97-1-20.dll], [Index=2, Type=Boolean, Value=False], ) [2021-08-27T15:15:18.6092818Z] [07759f195a294a98b5ac3fc4a7e8522800554b46593] [Error] [JvmBridge] java.lang.NullPointerException at org.apache.spark.SparkFiles$.getRootDirectory(SparkFiles.scala:37) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1568) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:159) at org.apache.spark.api.dotnet.DotnetBackendHandler$$anonfun$handleBackendRequest$1.apply$mcV$sp(DotnetBackendHandler.scala:99) at org.apache.spark.api.dotnet.ThreadPool$$anon$1.run(ThreadPool.scala:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

[2021-08-27T15:15:18.6096229Z] [07759f195a294a98b5ac3fc4a7e8522800554b46593] [Exception] [JvmBridge] JVM method execution failed: Nonstatic method 'addFile' failed for class '50' when called with 2 arguments ([Index=1, Type=String, Value=/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1630075737145_0002/container_1630075737145_0002_01_000001/.sparkdotnet/.dotnetinteractive/mqi1almc.zjx/4744a673-f62f-4fd9-bc6e-5c84ec6d4f97-1-20.dll], [Index=2, Type=Boolean, Value=False], ) at Microsoft.Spark.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] args) System.Exception: JVM method execution failed: Nonstatic method 'addFile' failed for class '50' when called with 2 arguments ([Index=1, Type=String, Value=/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1630075737145_0002/container_1630075737145_0002_01_000001/.sparkdotnet/.dotnetinteractive/mqi1almc.zjx/4744a673-f62f-4fd9-bc6e-5c84ec6d4f97-1-20.dll], [Index=2, Type=Boolean, Value=False], ) ---> Microsoft.Spark.JvmException: java.lang.NullPointerException at org.apache.spark.SparkFiles$.getRootDirectory(SparkFiles.scala:37) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1568) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:159) at org.apache.spark.api.dotnet.DotnetBackendHandler$$anonfun$handleBackendRequest$1.apply$mcV$sp(DotnetBackendHandler.scala:99) at org.apache.spark.api.dotnet.ThreadPool$$anon$1.run(ThreadPool.scala:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

--- End of inner exception stack trace --- at Microsoft.Spark.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] args) at Microsoft.Spark.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object arg0, Object arg1) at Microsoft.Spark.Interop.Ipc.JvmBridge.CallNonStaticJavaMethod(JvmObjectReference objectId, String methodName, Object arg0, Object arg1) at Microsoft.Spark.Interop.Ipc.JvmObjectReference.Invoke(String methodName, Object arg0, Object arg1) at Microsoft.Spark.SparkContext.AddFile(String path, Boolean recursive) at Microsoft.Spark.Extensions.DotNet.Interactive.AssemblyKernelExtension.<>cDisplayClass2_0.<b0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.KernelCommandPipeline.<>cDisplayClass6_1.<b3>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 75 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.Kernel.b21_0(KernelCommand originalCommand, KernelInvocationContext context, KernelPipelineContinuation next) in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\Kernel.cs:line 126 at Microsoft.DotNet.Interactive.KernelCommandPipeline.<>cDisplayClass6_1.<b3>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 75 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.Kernel.SetKernel(KernelCommand command, KernelInvocationContext context, KernelPipelineContinuation next) in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\Kernel.cs:line 218 at Microsoft.DotNet.Interactive.KernelCommandPipeline.<>cDisplayClass6_1.<b3>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 75 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.Kernel.<>c.<b20_0>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\Kernel.cs:line 100 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.KernelCommandPipeline.<>cDisplayClass6_0.<g__Combine|2>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 76 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.KernelCommandPipeline.<>cDisplayClass6_0.<gCombine|2>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 76 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.KernelCommandPipeline.<>cDisplayClass6_0.<gCombine|2>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 76 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.KernelCommandPipeline.<>cDisplayClass6_0.<g__Combine|2>d.MoveNext() in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 76 --- End of stack trace from previous location where exception was thrown --- at Microsoft.DotNet.Interactive.KernelCommandPipeline.SendAsync(KernelCommand command, KernelInvocationContext context) in F:\workspace_work\1\s\src\Microsoft.DotNet.Interactive\KernelCommandPipeline.cs:line 42

imback82 commented 3 years ago

Hi @akshatb1, I am trying to understand your scenario. Is there a reason why you want to close the SparkSession in an interactive session (notebook)?

imback82 commented 3 years ago

In the interactive session, the .NET repl compiles the notebook cell code and ship it to executors (for UDF scenarios). Since the Spark session is closed, shipping the compiled code (addFile) is failing. But again, I need to know why an user wants to stop the session.

akshatb1 commented 3 years ago

Hi @imback82, We are stopping the session to stop the resource consumption from the Yarn immediately before the session timeout kick in. We can set the timeout to a very small value to work around this. However, in some cases, it might be required to stop the session and start again with some configs which cannot be modified during runtime such catalog implementation etc.

The spark.Dispose() API was working in v12.0.0.

imback82 commented 3 years ago

Got it. Let me talk to the team and will get back on this.