aws-samples / amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples
MIT No Attribution
242 stars 59 forks source link

question: Pyspark Processing Jobs in Local Mode? #19

Open dcompgriff opened 2 years ago

dcompgriff commented 2 years ago

Hello. I was wondering if there existed a tutorial, or current support for 1) running a pyspark processing job locally and 2) doing so with a custom base docker (EMR) image? I see a tutorial for Dask using a script processor, and also some code for an SKLearn based processor. My goal is to be able to basically set up a local testing/dev environment that uses sagemaker spark processor code. I'm guessing this is more complicated than the other use cases since this processor is usually backed by an EMR cluster.

eitansela commented 2 years ago

Hi @dcompgriff PySparkProcessor will not work in local mode. This is a SageMaker Docker image and has nothing to do with EMR. You can build your own Spark Docker image, and use ScriptProcessor with it, the same as the Dask example and run it locally.