amplab / docker-scripts

Dockerfiles and scripts for Spark and Shark Docker images
261 stars 102 forks source link

Running application on the cluster #35

Open bharath12345 opened 10 years ago

bharath12345 commented 10 years ago

Am a Spark and Docker noob and this is actually a question and not an issue.

I followed your instructions and was able to setup the cluster and run the example. This is what I see as my cluster status -

vagrant@packer-virtualbox-iso:/vagrant/sparkling$ sudo docker ps
CONTAINER ID        IMAGE                           COMMAND                CREATED             STATUS              PORTS                NAMES
8f5d44eefa65        amplab/spark-worker:0.9.0       /root/spark_worker_f   About an hour ago   Up About an hour    8888/tcp             prickly_lumiere     
33c48ef9d17e        amplab/spark-worker:0.9.0       /root/spark_worker_f   About an hour ago   Up About an hour    8888/tcp             stoic_feynman       
d91e47ed0b90        amplab/spark-worker:0.9.0       /root/spark_worker_f   About an hour ago   Up About an hour    8888/tcp             ecstatic_babbage    
e173ecd4f4c0        amplab/spark-master:0.9.0       /root/spark_master_f   About an hour ago   Up About an hour    7077/tcp, 8080/tcp   berserk_nobel       
d67f979d70fe        amplab/dnsmasq-precise:latest   /root/dnsmasq_files/   About an hour ago   Up About an hour                      

I have written a Spark program for Linear Regression which runs perfectly in the local mode. It is a very small program and on github here

Now, I want to run this program on my spark cluster. The instructions in the Spark programming guide leave me scratching my head about what to do next. Want your help to know what is the right way to run the application -

  1. I get the scala prompt when I do the docker attach. Should I run my application from this prompt?
  2. I have a Vagrant setup on which am running docker. On my vagrant ubuntu box I have the application code which I compile and assemble using sbt. Can I somehow deploy the application after assembly from the sbt to the cluster?

If this has been explained elsewhere then please point me as I could not find any example on how to run an application program on a spark cluster.

Thank you very much.

AndreSchumacher commented 10 years ago

Hi @bharath12345 you're right that's not actually covered in the docs. Have you tried to scp your jar into the master container (see instructions on ssh login) and run it from there? I believe Spark should be installed inside /opt.

I'm afraid there is no way to directory deploy with sbt. However, you could use the data dir option when you start the cluster to attach a directory that you then deploy your jar to. You would still need to start it from the command line by ssh-ing into the master I guess.