Open zenonlpc opened 8 years ago
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
Thanks Elbamos
I will try install zeppelin with just -Pr
The spark is installed on EMR cluster by default and it is external to zeppelin.
How can I set the spark_home to use the external spark? In zeppelin-env.sh?
On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862
Hello Elbamos
I tried install zeppeln with just -Pr
mvn clean package -Pr -DskipTests
And set the spark_home in zeppelin-env.sh as below:
export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python
But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:
org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.
Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?
The EMR cluster is created with two applications Spark and Ganglia.
Thanks Zenon
On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Elbamos
I will try install zeppelin with just -Pr
The spark is installed on EMR cluster by default and it is external to zeppelin.
How can I set the spark_home to use the external spark? In zeppelin-env.sh?
On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862
I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.
On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:
Hello Elbamos
I tried install zeppeln with just -Pr
mvn clean package -Pr -DskipTests
And set the spark_home in zeppelin-env.sh as below:
export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python
But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?
The EMR cluster is created with two applications Spark and Ganglia.
Thanks Zenon
On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Elbamos
I will try install zeppelin with just -Pr
The spark is installed on EMR cluster by default and it is external to zeppelin.
How can I set the spark_home to use the external spark? In zeppelin-env.sh?
On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
Hello Elbamos
I already get the regular spark interpreter working before here is the maven complie command I used :
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
But when I try to add R in zeppelin, I did maven complie command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Now everything is not working it all give this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.
On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:
I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.
On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:
Hello Elbamos
I tried install zeppeln with just -Pr
mvn clean package -Pr -DskipTests
And set the spark_home in zeppelin-env.sh as below:
export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python
But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?
The EMR cluster is created with two applications Spark and Ganglia.
Thanks Zenon
On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Elbamos
I will try install zeppelin with just -Pr
The spark is installed on EMR cluster by default and it is external to zeppelin.
How can I set the spark_home to use the external spark? In zeppelin-env.sh?
On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862
— You are receiving this because you commented.
Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125
Hello Elbamos
I misunderstood you, I will try make R working with spark first not with Yarn.
Should I compile zeppelin using this command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr -DskipTests
and set zeppelin configuration:
export MASTER=spark_maste node export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Thanks
On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello Elbamos
I already get the regular spark interpreter working before here is the maven complie command I used :
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
But when I try to add R in zeppelin, I did maven complie command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Now everything is not working it all give this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java:445) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:
I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.
On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:
Hello Elbamos
I tried install zeppeln with just -Pr
mvn clean package -Pr -DskipTests
And set the spark_home in zeppelin-env.sh as below:
export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python
But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?
The EMR cluster is created with two applications Spark and Ganglia.
Thanks Zenon
On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Elbamos
I will try install zeppelin with just -Pr
The spark is installed on EMR cluster by default and it is external to zeppelin.
How can I set the spark_home to use the external spark? In zeppelin-env.sh?
On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862
— You are receiving this because you commented.
Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125
Before you try that how about
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
I think you should be able to remove all the profiles except R and yarn but let's try this first.
On May 10, 2016, at 3:22 PM, zenonlpc notifications@github.com wrote:
Hello Elbamos
I misunderstood you, I will try make R working with spark first not with Yarn.
Should I compile zeppelin using this command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr -DskipTests
and set zeppelin configuration:
export MASTER=spark_maste node export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Thanks
On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello Elbamos
I already get the regular spark interpreter working before here is the maven complie command I used :
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
But when I try to add R in zeppelin, I did maven complie command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Now everything is not working it all give this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java:445) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:
I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.
On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:
Hello Elbamos
I tried install zeppeln with just -Pr
mvn clean package -Pr -DskipTests
And set the spark_home in zeppelin-env.sh as below:
export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python
But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76) at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70) at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50) at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?
The EMR cluster is created with two applications Spark and Ganglia.
Thanks Zenon
On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Elbamos
I will try install zeppelin with just -Pr
The spark is installed on EMR cluster by default and it is external to zeppelin.
How can I set the spark_home to use the external spark? In zeppelin-env.sh?
On Fri, May 6, 2016 at 12:41 PM, elbamos notifications@github.com wrote:
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc notifications@github.com wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862
— You are receiving this because you commented.
Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
Hello Elbamos
I tried your suggestion again with following command and configuration: mvn clean package -Pyarn -Pr -DskipTests
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Then when I run zeppelin for any command I got an error:
org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.
After that I was trying to remove Yarn from zeppelin and using the existing spark in EMR cluster The maven command I used is mvn clean package -Pr -DskipTests
configuration for zeppelin
export MASTER=spark://sparkmasternode:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
When I run any zeppelin command I got this error:
org.apache.spark.SparkException: Could not parse Spark Master URL: ''
at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2735)
at org.apache.spark.SparkContext.
Thanks
On Tue, May 10, 2016 at 4:52 PM, elbamos notifications@github.com wrote:
Before you try that how about
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
I think you should be able to remove all the profiles except R and yarn but let's try this first.
On May 10, 2016, at 3:22 PM, zenonlpc notifications@github.com wrote:
Hello Elbamos
I misunderstood you, I will try make R working with spark first not with Yarn.
Should I compile zeppelin using this command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr -DskipTests
and set zeppelin configuration:
export MASTER=spark_maste node export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Thanks
On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello Elbamos
I already get the regular spark interpreter working before here is the maven complie command I used :
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
But when I try to add R in zeppelin, I did maven complie command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
And the configuration for Zeppelin:
export MASTER=yarn-client export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Now everything is not working it all give this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java:445) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152) at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356) at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150) at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
On Tue, May 10, 2016 at 1:23 PM, elbamos notifications@github.com wrote:
I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.
On May 10, 2016, at 11:46 AM, zenonlpc notifications@github.com wrote:
Hello Elbamos
I tried install zeppeln with just -Pr
mvn clean package -Pr -DskipTests
And set the spark_home in zeppelin-env.sh as below:
export MASTER= export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1 -Dspark.executor.cores=8 -Dspark.executor.memory=9193M -Dspark.default.parallelism=16" export PYTHONPATH=:/usr/lib/spark/python
But I still could get R working in zeppelin, when I tried the R commmand it gives me this error:
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.
(SparkContext.scala:530) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)
at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)
at
org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)
at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56) at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Looks like the Yarn is not properly configured with Spark. Any idea what I did wrong?
The EMR cluster is created with two applications Spark and Ganglia.
Thanks Zenon
On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Elbamos
I will try install zeppelin with just -Pr
The spark is installed on EMR cluster by default and it is external to zeppelin.
How can I set the spark_home to use the external spark? In zeppelin-env.sh?
On Fri, May 6, 2016 at 12:41 PM, elbamos < notifications@github.com> wrote:
It looks to me that Hadoop is failing to start.
Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.
Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
On May 6, 2016, at 9:32 AM, zenonlpc <notifications@github.com
wrote:
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub <
https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-217493862
— You are receiving this because you commented.
Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218229125
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-218287804
I think your spark master should be set to
export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook
Thanks Akshay
I will try that.
On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash notifications@github.com wrote:
I think your spark master should be set to (For e.g)
export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219469128
Hello Akshay
I did a list instances command on my EMR cluster master node, here is the result:
{ "Instances": [ { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418705.629 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-07cb76b585791dc13", "PublicDnsName": "ec2-52-87-213-254.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-59-226.ec2.internal", "PublicIpAddress": "52.87.213.254", "Id": "ci-3P2QMFSOSKF2S", "PrivateIpAddress": "172.31.59.226" }, { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418719.445 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-0cd3184eb7788816a", "PublicDnsName": "ec2-52-90-79-148.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-58-205.ec2.internal", "PublicIpAddress": "52.90.79.148", "Id": "ci-13EARLMDOU64L", "PrivateIpAddress": "172.31.58.205" } ] }
I only have 1 master node and 1 slave node. Based on your previous reply, I should use the priveateDNS Name as the spark master host name is this correct?
For this cluster the spark master should be set to spark://ip-172-31-59-226.ec2.internal:7077
Thanks
On Mon, May 16, 2016 at 1:09 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Akshay
I will try that.
On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash <notifications@github.com
wrote:
I think your spark master should be set to (For e.g)
export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219469128
Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077
OR if you prefer using linux editor in CentOS
$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq
If the above doesn't work try working it without the listening port.
Hello guys
I tried Akshay's suggestion, but it still didn't work for me.
Compiled just R in zeppelin: mvn clean package -Pr -DskipTests
Configuration for zeppelin :
export MASTER=spark://ip-172-31-59-226.ec2.internal export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
Now when running R command in notebook, it gave this error:
org.apache.spark.SparkException: Invalid master URL:
spark://ip-172-31-59-226.ec2.internal
at
org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2121)
at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47)
at
org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48)
at
org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.deploy.client.AppClient.
Looks like all I need is the correct spark master URL, but I couldn't find it easily. So I googled found this link:
http://stackoverflow.com/questions/30760792/how-to-find-spark-master-url-on-amazon-emr
From this link, my understanding is that EMR spark cluster is created with YARN installed as default, so if I want to use external spark distribution installed by EMR, I am stuck with YARN.
Can anyone help me with this battle? I have been struggled with issue for almost two weeks.
Thanks in Advance
On Mon, May 16, 2016 at 1:27 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello Akshay
I did a list instances command on my EMR cluster master node, here is the result:
{ "Instances": [ { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418705.629 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-07cb76b585791dc13", "PublicDnsName": "ec2-52-87-213-254.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-59-226.ec2.internal", "PublicIpAddress": "52.87.213.254", "Id": "ci-3P2QMFSOSKF2S", "PrivateIpAddress": "172.31.59.226" }, { "Status": { "Timeline": { "ReadyDateTime": 1463419008.136, "CreationDateTime": 1463418719.445 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-0cd3184eb7788816a", "PublicDnsName": "ec2-52-90-79-148.compute-1.amazonaws.com", "PrivateDnsName": "ip-172-31-58-205.ec2.internal", "PublicIpAddress": "52.90.79.148", "Id": "ci-13EARLMDOU64L", "PrivateIpAddress": "172.31.58.205" } ] }
I only have 1 master node and 1 slave node. Based on your previous reply, I should use the priveateDNS Name as the spark master host name is this correct?
For this cluster the spark master should be set to spark://ip-172-31-59-226.ec2.internal:7077
Thanks
On Mon, May 16, 2016 at 1:09 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Thanks Akshay
I will try that.
On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash < notifications@github.com> wrote:
I think your spark master should be set to (For e.g)
export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219469128
Hello Akshay
After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:
org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
This looks like one step close to get it working.
Thanks Zenon
On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash notifications@github.com wrote:
Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077
OR if you prefer using linux editor in CentOS
$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq
If the above doesn't work try working it without the listening port.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210
Hello guys
Thanks for helping me on this issue, really appreciated your time and effort.
Instead of piece of information, I want to give all the information so you might be bale to help resolve this quickly.
The EMR cluster I created is with release label emr-4.4.0 with Spark(1.6.0) , Ganaglia (3.7.2) as applications.
I used the following script to install zeppelin 0.6.0 as bootstrap action while the cluster started.
if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true" ]; then
sudo yum -y install git
# Install Maven
wget -P /tmp
http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz sudo mkdir /opt/apache-maven sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven
cat <<EOF >> /home/hadoop/.bashrc
export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3 export PATH=\$MAVEN_HOME/bin:\$PATH EOF source /home/hadoop/.bashrc
git clone https://github.com/apache/incubator-zeppelin.git
/home/hadoop/zeppelin cd /home/hadoop/zeppelin
sudo mkdir /tmp/rjars/ sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz
mvn clean package -Pr -DskipTests
# Configure Zeppelin
SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf echo ${SPARK_DEFAULTS} declare -a ZEPPELIN_JAVA_OPTS if [ -f $SPARK_DEFAULTS ]; then ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) fi echo ${SPARK_DEFAULTS} echo "${ZEPPELIN_JAVA_OPTS[@]}"
CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"' '{print $4}') echo $CLUSTER_ID
SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID --instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print $4}') echo $SPARK_MASTER_URL
cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
cat <<EOF>> conf/zeppelin-env.sh
export MASTER=spark://${SPARK_MASTER_URL}:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python EOF
cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml
bin/zeppelin-daemon.sh start
fi
Previously, when I built zeppelin with this command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests
It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the ZEPPELIN_JAVA_OPTS is empty.
I tried the script running after the cluster is created, the variable ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap action, the variable ZEPPELIN_JAVA_OPTS is empty.
Does this mean the spark is not installed when I try to install zeppelin as bootstrap action during the cluster creation?
If that is the case, why the previous command is working fine?
This original script is coming from this post:
https://gist.github.com/andershammar/224e1077021d0ea376dd
Thanks Zenon
On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello Akshay
After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:
org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
This looks like one step close to get it working.
Thanks Zenon
On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash notifications@github.com wrote:
Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077
OR if you prefer using linux editor in CentOS
$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq
If the above doesn't work try working it without the listening port.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210
Hello Everyone
We finally figured out the issue, instead of using %r we should %knitr to run R code in zeppelin
Thanks Zenon
On Tue, May 17, 2016 at 10:38 AM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello guys
Thanks for helping me on this issue, really appreciated your time and effort.
Instead of piece of information, I want to give all the information so you might be bale to help resolve this quickly.
The EMR cluster I created is with release label emr-4.4.0 with Spark(1.6.0) , Ganaglia (3.7.2) as applications.
I used the following script to install zeppelin 0.6.0 as bootstrap action while the cluster started.
! /bin/bash -ex
if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true" ]; then
Install Git
sudo yum -y install git # Install Maven wget -P /tmp
http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz sudo mkdir /opt/apache-maven sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven
cat <<EOF >> /home/hadoop/.bashrc
export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3 export PATH=\$MAVEN_HOME/bin:\$PATH EOF source /home/hadoop/.bashrc
Install Zeppelin
git clone https://github.com/apache/incubator-zeppelin.git
/home/hadoop/zeppelin cd /home/hadoop/zeppelin
install some R packages before build zeppelin
sudo mkdir /tmp/rjars/ sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz
Building Zeppelin with R
mvn clean package -Pr -DskipTests # Configure Zeppelin
SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf echo ${SPARK_DEFAULTS} declare -a ZEPPELIN_JAVA_OPTS if [ -f $SPARK_DEFAULTS ]; then ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) fi echo ${SPARK_DEFAULTS} echo "${ZEPPELIN_JAVA_OPTS[@]}"
Getting cluster ID
CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"' '{print $4}') echo $CLUSTER_ID
Getting Spark host URL from aws emr list-instances command
SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID --instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print $4}') echo $SPARK_MASTER_URL
Putting values in zeppelin-env.sh
cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh cat <<EOF>> conf/zeppelin-env.sh
export MASTER=spark://${SPARK_MASTER_URL}:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python EOF
change zeppelin port to 7002
cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml
Start the Zeppelin daemon
bin/zeppelin-daemon.sh start
fi
Previously, when I built zeppelin with this command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests
It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the ZEPPELIN_JAVA_OPTS is empty.
I tried the script running after the cluster is created, the variable ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap action, the variable ZEPPELIN_JAVA_OPTS is empty.
Does this mean the spark is not installed when I try to install zeppelin as bootstrap action during the cluster creation?
If that is the case, why the previous command is working fine?
This original script is coming from this post:
https://gist.github.com/andershammar/224e1077021d0ea376dd
Thanks Zenon
On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello Akshay
After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:
org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
This looks like one step close to get it working.
Thanks Zenon
On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash <notifications@github.com
wrote:
Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077
OR if you prefer using linux editor in CentOS
$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq
If the above doesn't work try working it without the listening port.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210
Both of those should work without a problem. If you are using the latest Zeppelin from master, though, there are a lot of recently introduced bugs that could cause this. You may be happier using the version from my repo.
On Jun 23, 2016, at 10:53 AM, zenonlpc notifications@github.com wrote:
Hello Everyone
We finally figured out the issue, instead of using %r we should %knitr to run R code in zeppelin
Thanks Zenon
On Tue, May 17, 2016 at 10:38 AM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello guys
Thanks for helping me on this issue, really appreciated your time and effort.
Instead of piece of information, I want to give all the information so you might be bale to help resolve this quickly.
The EMR cluster I created is with release label emr-4.4.0 with Spark(1.6.0) , Ganaglia (3.7.2) as applications.
I used the following script to install zeppelin 0.6.0 as bootstrap action while the cluster started.
! /bin/bash -ex
if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true" ]; then
Install Git
sudo yum -y install git
Install Maven
wget -P /tmp http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz sudo mkdir /opt/apache-maven sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven
cat <
> /home/hadoop/.bashrc export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3 export PATH=\$MAVEN_HOME/bin:\$PATH EOF source /home/hadoop/.bashrc
Install Zeppelin
git clone https://github.com/apache/incubator-zeppelin.git /home/hadoop/zeppelin cd /home/hadoop/zeppelin
install some R packages before build zeppelin
sudo mkdir /tmp/rjars/ sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz sudo wget -P /tmp/rjars/ https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz
Building Zeppelin with R
mvn clean package -Pr -DskipTests
Configure Zeppelin
SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf echo ${SPARK_DEFAULTS} declare -a ZEPPELIN_JAVA_OPTS if [ -f $SPARK_DEFAULTS ]; then ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}" \ $(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS | awk '{print "-D" $1 "=" $2}')) fi echo ${SPARK_DEFAULTS} echo "${ZEPPELIN_JAVA_OPTS[@]}"
Getting cluster ID
CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"' '{print $4}') echo $CLUSTER_ID
Getting Spark host URL from aws emr list-instances command
SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID --instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print $4}') echo $SPARK_MASTER_URL
Putting values in zeppelin-env.sh
cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh cat <
> conf/zeppelin-env.sh export MASTER=spark://${SPARK_MASTER_URL}:7077 export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/lib/spark export HADOOP_CONF_DIR=/etc/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}" export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python EOF change zeppelin port to 7002
cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml
Start the Zeppelin daemon
bin/zeppelin-daemon.sh start fi
Previously, when I built zeppelin with this command:
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests
It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the ZEPPELIN_JAVA_OPTS is empty.
I tried the script running after the cluster is created, the variable ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap action, the variable ZEPPELIN_JAVA_OPTS is empty.
Does this mean the spark is not installed when I try to install zeppelin as bootstrap action during the cluster creation?
If that is the case, why the previous command is working fine?
This original script is coming from this post:
https://gist.github.com/andershammar/224e1077021d0ea376dd
Thanks Zenon
On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu zenonlpc@gmail.com wrote:
Hello Akshay
After adding the port to spark master URL and restarting the zeppelin server, I got this error when running R command:
org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at org.apache.zeppelin.scheduler.Job.run(Job.java:176) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
This looks like one step close to get it working.
Thanks Zenon
On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash <notifications@github.com
wrote:
Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to spark://ip-172-31-59-226.ec2.internal:7077
OR if you prefer using linux editor in CentOS
$ cd /etc/spark/conf $ vi spark-env.sh export SPARK_HOME=/usr/lib/spark export MASTER=spark://ip-172-31-59-226.ec2.internal:7077 (Esc) :wq
If the above doesn't work try working it without the listening port.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/elbamos/Zeppelin-With-R/issues/17#issuecomment-219496210
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Hey guys, Bumped into a similar error while running using the zeppelin demon provided by the EMR instance. I followed the steps as mentioned here and was successful in launch in the sparkR shell but getting 'r' interpreter not found error. The version of zeppelin running on EMR is 0.6.1. I tried following the conversation on the mailing list and from my understanding, the r interpreter should be part of the build right ?
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595) at java.lang.Class.getConstructor0(Class.java:2895) at java.lang.Class.newInstance(Class.java:354) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373) at java.util.ServiceLoader$1.next(ServiceLoader.java: at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2 applications: Spark1.6.0 Ganglia 3.7.2 release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong