Closed buggtb closed 7 years ago
it's working
Okay cool, at least I wasn't just making up requirements for the sake of it. I'll fix it up tonight or tomorrow and ship you over the changes.
Thanks @buggtb :-)
Thanks for doing this.
I feel we shall simply delegate this task to spark-submit
tool http://spark.apache.org/docs/latest/submitting-applications.html
I suggest this because spark-submit is a sophisticated tool for doing this task. It provides us complete solution such as allocation of resources to the submitted job ( RAM, CPU etc).
@karanjeets @buggtb Let me know your opinions. If you guys agree to this, I will update maven built to produce a jar which will be optimized for spark-submit
(optimization here - exclude unnecessary libs like scala, spark, hadoop etc since they ought to be picked up from the deployed cluster )
Personally, I like the fact I can launch stuff to a remote cluster using the sparkler jar, as someone who is pretty spark ignorant I very much like the convenience although I can certainly see the benefits of using spark submit. I would support both and accept that the jar launcher might be quick and dirty but provides a certain level of convenience over spark-submit which we can certainly use for juju deployments and so on, supporting spark jobs from within the jar doesn't appear to add a lot of overheads except size, but you could build "fat" and "thin" jars at build time to trim it down.
On Thu, Dec 29, 2016 at 6:44 PM, Thamme Gowda notifications@github.com wrote:
Thanks for doing this.
I feel we shall simply delegate this task to spark-submit tool http://spark.apache.org/docs/latest/submitting-applications.html
I suggest this because spark-submit is a sophisticated tool for doing this task. It provides us complete solution such as allocation of resources to the submitted job ( RAM, CPU etc).
@karanjeets https://github.com/karanjeets @buggtb https://github.com/buggtb Let me know your opinions. If you guys agree to this, I will update maven built to produce a jar which is optimized for spark-submit (optimization here - exclude unnecessary libs like scala, spark, hadoop etc since they ought to be picked up from the deployed cluster )
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/USCDataScience/sparkler/pull/60#issuecomment-269674175, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGUeDW2oZzm-R3QGzZ8rJCia5Of3ZQ2ks5rM_-JgaJpZM4LWR7C .
-- Tom Barber CTO Spicule LTD tom@spicule.co.uk
@spiculeim http://twitter.com/spiculeim
Schedule a meeting with me http://meetme.so/spicule
GB: +44(0)5603641316 US: +18448141689
I've updated the PR with the changes @karanjeets suggested.
@thammegowda - Thanks for sharing your thoughts on this. I would +1 @buggtb 's response. Even I am in favor of having spark libs in Sparkler. This is extremely helpful for people who run small crawls and don't want to run into the hassle of standing up a cluster. Also, I liked the idea @buggtb suggested to handle this at build time and create 'fat' and 'thin' jars. The '--add-jar' command provides a great alternative and its implementation shows that it should be identical to the spark-submit '--jars'.
If you don't have any other queries and approve this PR, I will go ahead and merge.
@buggtb Thanks for the changes. 👍
@karanjeets My suggestion was to support both the types of builds. Maven had build profiles, using which we can produce one with fat jar including all libs, other as an optimized jar for spark submit.
:+1: proceed merging this PR.
We can support the spark-submit
for advanced used when the need arise, that will not disturb this functionality. We will have to support it nevertheless, as I see resource allocations are essential when sharing the cluster for other jobs. As of now, we can use the same fat jar to spark-submit
, only worry is that there is higher chance of class versions mismatch of transitive dependencies.
Thanks @buggtb
To help get my builds and charm aligned, I'll merge this as you two have signed it off. Thanks for accepting it!
Ability to add jar to spark context via command line option.