Open laneb opened 5 years ago
That library looks like it could use a facelift but I guess is useful to look at.
One think that would make this ticket easier is if we moved all the arg parsing to python and out of the shell scripts.
Then the smv-blah scripts just become shebanged Python scripts, and importing via import becomes the real first class citizen since it’s used internally. On Nov 7, 2018, 16:30 -0800, Lane Barlow notifications@github.com, wrote:
Goal I should be able to do the following $ pip install smv $ python
import smv app = smvApp.createInstance([], None) app.runModule("foo") without having to mess around and add things to my class path or sys.path
• The smv package should handle injecting the fat jar • smv should try to add pyspark and py4j to the sys.path if they're not there (can consider https://github.com/minrk/findspark)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Needs a small whiteboard session I think. These are definitely interrelated goal.
@laneb is the target to still use spark-submit
or not?
Not necessarily. I think jobs that need to run in yarn cluster mode will still have to be submitted via spark-submit
, but we may be able to shift the burden to users.
@jacobdr What's the goal that motivates removing the bash scripts? Just to simplify arg parsing?
Goal
I should be able to do the following
without having to mess around and add things to my class path or
sys.path
smv
package should handle injecting the fat jarsmv
should try to addpyspark
andpy4j
to thesys.path
if they're not there (can consider https://github.com/minrk/findspark)sys.dont_write_bytecode
instead of viaPYTHONDONTWRITEBYTECODE
env var