IBM-Cloud / BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
https://console.ng.bluemix.net/catalog/services/biginsights-for-apache-hadoop
Apache License 2.0
23 stars 13 forks source link

Speed up Oozie Spark example #22

Open pregazzoni opened 8 years ago

pregazzoni commented 8 years ago

In order for oozie spark job to run in Yarn we need the spark-assembly.jar to be in job path. Right now we get the jar for the cluster (webhdfs) and then put (webhdfs) it into the $jobDir/lib directory. This takes over few minutes.

Another way would be too have the lib in the oozie shared lib directory by default.

As oozie, you can do:

# Copy spark-assembly jar to Oozie shared lib directory
hdfs dfs -put /usr/iop/current/spark-client/lib/spark-assembly.jar /user/oozie/share/lib/lib_20160805191701/spark/.

# Set oozie environment
source /usr/iop/current/oozie-client/bin/oozie-env.sh
export OOZIE_URL=http://<replace with oozie node>:11000/oozie

# Update shared lib
oozie admin -sharelibupdate

Once this is done, there is no need to put the jar under $jobDir/lib as it will be automatically picked from the oozie shared lib.

snowch commented 8 years ago

This looks good Pierre. Would these steps fo into a new task called something like Setup that the user would just run once with gradle?

Will it also work on basic clusters?

pregazzoni commented 8 years ago

@snowch need to look into this more closely as I believe you would need to become oozie user to do this (so need root). Same is true for basic.

I am also inquiring if this could become default though so it is there by default in the shared lib to start with.

snowch commented 8 years ago

Ah, cool. Thanks @pregazzoni