dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
https://dot.net/spark
MIT License
2.02k stars 315 forks source link

Dispose JvmBridge correctly #24

Open imback82 opened 5 years ago

imback82 commented 5 years ago

Currently, a JvmBridge instance is a static member of SparkEnvironment class. Without forcing the user to call something like SparkEnvironment.JvmBridge.Dispose() in his/her application, there is no clean way to dispose JvmBridge, thus Scala side handles the disconnect gracefully (#121).

One approach to address this issue is to have a ref-counted SparkSession where JvmBridge.Dispose() is called when the last SparkSession object is disposed.

// New JvmBridge should be instantiated with the following. using (var spark = SparkSession.Builder().GetOrCreate()) { // do somthing }


One issue with relying on `SparkSession` is that there are few classes such as `SparkConf` and `Builder` that accesses the `JvmBridge` directly from `SparkEnvironment` and these classes do not implement IDisposable (to be consistent with Scala Spark API), so it is harder to enforce cleaning up the `JvmBridge` if an user does the following

public static void Main(string[] args) { var conf = new SparkConf(); // exits Main without creating SparkSession. }



cc: @rapoth @stephentoub 
Pheewww commented 1 year ago

One possible approach to address this issue is to create a new class that wraps the JvmBridge instance and implements IDisposable. This wrapper class can then be used to manage the lifecycle of the JvmBridge instance and ensure that it is properly disposed of when it is no longer needed.

@rapoth can i work on it?