aalkilani / spark-kafka-cassandra-applying-lambda-architecture

Other
64 stars 52 forks source link

Error launching spark-submit #4

Closed dmcarba closed 7 years ago

dmcarba commented 7 years ago

Hi,

I just downloaded the last version of the box (with the fixes for the nodemanager service and env variables for java and hadoop home) but I found the exception below when launching the job with spark-submit.

I tried copying the 1.6.1 file manually but it got another exception then (Classnotfound, seems related to the scala version) so I assume the correct one is spark-assembly-1.6.3-hadoop2.7.0.jar

I'm using macos 10.12.1.

Thanks

Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://lambda-pluralsight:9000/spark/spark-assembly-1.6.1-hadoop2.6.0.jar at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:134) at org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:467) at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2193) at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2189) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2195) at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:601) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:327) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:407) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:446) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:444) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:444) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:727) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

aalkilani commented 7 years ago

Can you please provide the following so we can further troubleshoot:

  1. Which user account (vagrant or root) was the command executed as?
  2. Can you provide the exact command that was used?
  3. Can you tell me which version of the vagrant image do you have? You can tell when you run vagrant up if it's the latest. You can also tell by running "vagrant box list". Do you have 0.0.5 or something else?

Thanks

dmcarba commented 7 years ago

Hi,

I just checked the version of the box and is not the latest, is 0.05:

==> default: A newer version of the box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' is available! You currently ==> default: have version '0.0.5'. The latest is version '0.0.6'. Run ==> default: vagrant box update to update.

Sorry I just made a git clone today and I supposed the vagrant box was the latest I will issue a vagrant update and try again.

The user is vagrant

The command issued is the submit for the first yarn job example:

./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar

Thanks!

aalkilani commented 7 years ago

Great. Give 0.0.6 a try however even if you're on 0.0.5 everything should still work. The point of the updated scripts is so that transition is seamless even if you're on an older box so we don't constantly ask to download a large image. It seems like this is a legitimate problem when updating from 0.0.5. I'm looking into it.

Having said that, there are advantages for moving onto the newer image. Everything should be streamlined and any issues reported earlier will have been addressed. Please let me know how 0.0.6 works for you.

Thanks

dmcarba commented 7 years ago

I upgraded to 0.0.6 successfully, but when using vagrant up, the box is stuck starting with the messages:

==> default: Running 'pre-boot' VM customizations... ==> default: Booting VM... ==> default: Waiting for machine to boot. This may take a few minutes... default: SSH address: 127.0.0.1:2222 default: SSH username: vagrant default: SSH auth method: private key default: Warning: Authentication failure. Retrying... default: Warning: Authentication failure. Retrying... default: Warning: Authentication failure. Retrying... default: Warning: Authentication failure. Retrying... . .

At the end it shows the message:

Timed out while waiting for the machine to boot. This means that Vagrant was unable to communicate with the guest machine within the configured ("config.vm.boot_timeout" value) time period.

Afterwards I am able to connect with vagrant ssh (password: vagrant), but when trying spark submit it fails because the resource manager is not up.

Thanks

dmcarba commented 7 years ago

Hi,

I removed the box and the git folder to start from scratch and executed again git clone, then vagrant up and downloaded the box, this time 0.0.6

Then vagrant ssh, moved to the /vagrant directory and executed fixes.sh

After that I tried again the spark-submit command, the job seems to be running now, I got an exception however:

User class threw exception: java.lang.IllegalArgumentException: Pathname /lambda-pluralsight:9000/lambda/batch1 from hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1 is not a valid DFS filename. java.lang.IllegalArgumentException: Pathname /lambda-pluralsight:9000/lambda/batch1 from hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)

The offending line is:

activityByProduct.write.partitionBy("timestamp_hour").mode(SaveMode.Append).parquet("hdfs:///lambda-pluralsight:9000/lambda/batch1")

First I removed the namenode url, but then I have multiple exceptions of containers exiting with error -1. I assume it was due hdfs permission so I ended using the path "lambda/batch1". The job created the files under the /user/vagrant directory in hdfs. Anyway the resource manager keeps crashing during the job executions

aalkilani commented 7 years ago

@dmcarba would you mind pointing out exactly which clip from the course you're trying to run so I can use the same code and setup. Are you running a spark-submit or through the IDE. Perhaps in Zeppelin?

Note from the exception there seems to be something off with the path used: hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1

Also regarding the fixes script. Did you run that as the root user? The script is executed correctly if you run vagrant provision

You don't need to run it yourself. vagrant provision will handle it for you.

Thanks

dmcarba commented 7 years ago

I used the vagrant provision command as you pointed and increased the vm memory to 8192 in the vagrantfile.

I am testing the BatchJob, the first yarn example in lesson 2 using the spark-submit command

I also had to change the parquet destination path in the code removing the namenode url, from "hdfs:///lambda-pluralsight:9000/lambda/batch1" to "hdfs:///lambda/batch1"

Now the spark job finishes correctly , all the parquet files are created in the expected path.

I think this issue can be closed.

Thank you for your support!

aalkilani commented 7 years ago

I have set sensible defaults for Spark now so it should work even with the constrained 4GB image but if you have the luxury to go up to 8GB by updating the vagrant file then that's also great. Closing this. Thanks for the feedback.