Hydrospheredata / mist

Serverless proxy for Spark cluster
http://hydrosphere.io/mist/
Apache License 2.0
326 stars 68 forks source link

Reduce jar copying and allow specify jar path when the artifact is already somewhere in cluster #525

Open thetuxedo opened 5 years ago

thetuxedo commented 5 years ago

I have only one function, which takes json as parameter and do different things based on the json parameter. So I have only one jar which is quite big.

The current implement will copy the jar from mist master to all executors in the cluster for each run of the function, and take a lot of disk space, unnecessarily in this case.

It will be good to allow user to specify the jar path in cluster and omit the jar copying step, or even better, create a method to publish the artifact jar to the cluster, and then omit jar copying steps for this particular function.