dfdx / Spark.jl

Julia binding for Apache Spark
Other
205 stars 39 forks source link

Do we need an embedded JVM on the worker nodes #28

Closed aviks closed 7 years ago

aviks commented 7 years ago

Scala starts Julia as an external process, and connects with it over a socket. Does the Julia process need to start an embedded JVM on the workers?

cc: @dfdx

dfdx commented 7 years ago

As far as I can see, there's no need in additional JVM: Spark starts Julia to execute a task (just like it starts Python in PySpark), Julia process executes this task and exits. Is there any reason to start an additional JVM?

Note, that this is different from a driver process where we use Julia to instantiate JVM and control program flow.

aviks commented 7 years ago

Ok, that is what I thought. Currently using Spark starts the embedded JVM, which means it also starts on the worker, since julia on the worker is started with julia -e 'using Spark'.

I'll then do a PR to change this. This will mean that on the driver, we will need to do an explicit init() call.

dfdx commented 7 years ago

Ah, I missed this detail. Thanks!