Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Apache License 2.0
459 stars 35 forks source link

Question: Is there a way to speed up dependency resolution for jupyter integration? #207

Closed devongleeson closed 1 year ago

devongleeson commented 1 year ago

I really like the jupyter integration with this library %use spark. However, the dependency resolution is very very slow, and can take 10-15 minutes to start up sometime. Am I doing something wrong? Are there any hints on how to make this go faster?

Jolanrensen commented 1 year ago

10-15 minutes is outrageously long. Last time I checked it was a couple of minutes tops. I'll investigate if something is taking longer than it has to using

%useLatestDescriptors
%trackExecution
%use spark

However, do note that this cell only needs to run once each time you open the notebook. The Spark connection remains open in the background.

Jolanrensen commented 1 year ago

I'd also recommend running

SessionOptions.resolveSources = false

in a cell before the %use statement. A lot of time goes into resolving the sources for all Spark dependencies.

devongleeson commented 1 year ago

Wow, the SessionOptions.resolveSources = false dramatically improved the speed at which my notebook was able to startup. Thanks!

Jolanrensen commented 1 year ago

It does! I've notified the Notebooks team about the issue as well

ileasile commented 1 year ago

@devongleeson @Jolanrensen You can try 0.12.0.63.dev1 version of kernel if you are using Jupyter client other than Kotlin Notebook. Kotlin Spark resolution should be much faster there. In Kotlin Notebook it will be possible to select kernel version in 2023.3 release