Chapter 3 - "Saving to HDFS and Executing...." location 6:20mins

robbie70 commented 6 years ago

Hi Ahmad, I am not sure if this is the right place to raise issues. I have been following your Lambda Spark course on Pluralsight and I am stuck at the point shown in the title. When I try to execute the statement mentioned,

cd /pluralsight/spark/

./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar

I get an Exception and my start-up fails.

18/02/09 09:16:44 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.fs.Path.<init>(Path.java:93) at org.apache.hadoop.fs.Globber.glob(Globber.java:211) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:912) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.foreach(RDD.scala:910) at batch.BatchJob$.main(BatchJob.scala:27) at batch.BatchJob.main(BatchJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558) Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.failExpecting(URI.java:2854) at java.net.URI$Parser.parse(URI.java:3057) at java.net.URI.<init>(URI.java:746) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 31 more

I have spent quite a bit of time trying to get to the bottom of it but so far no luck. I have managed to pull the attached logs from the Hadoop webservice running in my VM url here,

http://lambda-pluralsight:8042/node/containerlogs/container_1518176371217_0003_01_000001/vagrant/stderr/?start=0

Logs for container_1518176371217_0003_01_000001.html.pdf

Also I've tried to start the application up in Debug Mode (at say port 5005 or 7777 as I found in some online examples) - but when I try starting my Intellij up in Remote Debug Mode I get a Connection Refused error message.

Any help or pointers would be much appreciated. My email is, robbie70@hotmail.com Kind Regards, Robert.

robbie70 commented 6 years ago

Hi Ahmad, I put this tutorial to one side for a few days because I was stuck (and was maybe hoping to hear from someone on this site ! ;-) ) but today I have come back to it again hoping with "fresh eyes" and indeed it helped - I was able to spot my mistake immediately. I had renamed my Scala program from the name you show in the tutorial, "BatchJob" to "BatchJobEx4" but when I tried to run it I was still using the old progamme name you show in the example ie, ./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar instead of updating it to my new name, ./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJobEx4 /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar

When I executed it as renamed it worked perfectly :) so I will now continue with the tutorial and this issue can be closed.

aalkilani commented 6 years ago

@robbie70 , thanks for the feedback and glad you were able to move forward.

aalkilani / spark-kafka-cassandra-applying-lambda-architecture

Chapter 3 - "Saving to HDFS and Executing...." location 6:20mins #27