Open randerzander opened 2 years ago
@randerzander resetting the Context would not cause another JVM instance to be started. In fact the jvm instance is created before even creating a Context and rather is started as a result of the first import dask_sql ....
is called
For reference ....
>>> import dask_sql
Starting JVM from path /home/jdyer/anaconda3/envs/dask-sql/lib/server/libjvm.so...
...having started JVM
>>> c1 = dask_sql.Context()
>>> c2 = dask_sql.Context()
jvm instance is created before even creating a Context and rather is started as a result of the first import dask_sql .... is called
thanks for confirming
Sometimes it's not clear when running a number of SQL scripts in a session whether all scripts "clean up" after themselves (dropping temp tables, unpersisting tables, de-registering UDFs, etc).
It would be useful to support Dask-SQL features that:
In benchmarking work, we attempted to drop all tables manually with:
but (at least in dask-cudf) this didn't appear to successfully remove all references to persisted DataFrames. Memory use remained high, and only dropped after calling Dask's
client.restart()
.For #3, the user can always create a new Context, but its not clear if this incurs the overhead of restarting the underlying JVM (cc @jdye64 for comment).