NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

Asking method for running scoobi program in parallel. #307

Closed lyjgeorge closed 10 years ago

lyjgeorge commented 10 years ago

Hello: I want to know that is there any way to run scoobi job in parallel. For example, if I want to do three job :A,B and C. job A and B require same input data, and job C can only start if both A and B finished. Is there any mechanism in scoobi that can help me do this?

thank you very much!

etorreborre commented 10 years ago

This can be done. If you have

val list1 = fromTextFile("test").map(_+"a")
val list2 = fromTextFile("test").map(_+"b")

val result = (list1 ++ list2).map(_+"c")
result.run

Then you should have 3 jobs, one for list1, one for list2 and, when the first 2 are over, there will be a 3rd job for the result. However by default we don't submit the "independent" jobs at the same time but sequentially. If you want them to be submitted at the same time you need to set the variable scoobi.concurrentjobs to true in your configuration. Be aware that this is not a feature that has been heavily tested!

etorreborre commented 10 years ago

Closing this issue until further questions.