TresAmigosSD / SMV

Spark Modularized View
Apache License 2.0
42 stars 22 forks source link

Make SMV easy to import and use #1489

Open laneb opened 5 years ago

laneb commented 5 years ago

Goal

I should be able to do the following

$ pip install smv
$ python
>>> import smv
>>> app = smvApp.createInstance([], None)
>>> app.runModule("foo")

without having to mess around and add things to my class path or sys.path

jacobdr commented 5 years ago

That library looks like it could use a facelift but I guess is useful to look at.

One think that would make this ticket easier is if we moved all the arg parsing to python and out of the shell scripts.

Then the smv-blah scripts just become shebanged Python scripts, and importing via import becomes the real first class citizen since it’s used internally. On Nov 7, 2018, 16:30 -0800, Lane Barlow notifications@github.com, wrote:

Goal I should be able to do the following $ pip install smv $ python

import smv app = smvApp.createInstance([], None) app.runModule("foo") without having to mess around and add things to my class path or sys.path

• The smv package should handle injecting the fat jar • smv should try to add pyspark and py4j to the sys.path if they're not there (can consider https://github.com/minrk/findspark)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

laneb commented 5 years ago

Needs a small whiteboard session I think. These are definitely interrelated goal.

ninjapapa commented 5 years ago

@laneb is the target to still use spark-submit or not?

laneb commented 5 years ago

Not necessarily. I think jobs that need to run in yarn cluster mode will still have to be submitted via spark-submit, but we may be able to shift the burden to users.

laneb commented 5 years ago

@jacobdr What's the goal that motivates removing the bash scripts? Just to simplify arg parsing?