Closed kitgary closed 7 years ago
Can you please make sure you're running the latest version of the image and git scripts. They have both changed significantly in the last few days and weeks that it makes sense to just start there.
See which version of the vagrant image you have:
vagrant box list
Do you have 0.0.6 or something else?
If you have something older than 0.0.5 then let's get you on the latest box. Here's what to do (it will require some network bandwidth to download the new image) and this will permanently destroy the current one you have so be sure to backup or take a copy of anything you have that's on there.
git pull origin master
vagrant destroy
vagrant box update
vagrant up
Thanks
Thanks! I get it working!
I checked that I had the latest box 0.0.6, but after destroying the old box and starting a new one, everything worked fine. It's kind of weird...oz..
Thanks again.
Hi,
I failed when run batchjob to yarn I already use boxes v0.0.6 and the errol log is:
`16/12/06 06:02:51 INFO yarn.Client: Application report for application_1480993486657_0008 (state: FAILED) 16/12/06 06:02:51 INFO yarn.Client: client token: N/A diagnostics: Application application_1480993486657_0008 failed 2 times due to AM Container for appattempt_1480993486657_0008_000002 exited with exitCode: 10 For more detailed output, check application tracking page:http://lambda-pluralsight:8088/cluster/app/application_1480993486657_0008Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1480993486657_0008_02_000001 Exit code: 10 Stack trace: ExitCodeException exitCode=10: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 10 Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1481004161092 final status: FAILED tracking URL: http://lambda-pluralsight:8088/cluster/app/application_1480993486657_0008 user: vagrant Exception in thread "main" org.apache.spark.SparkException: Application application_1480993486657_0008 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/12/06 06:02:51 INFO util.ShutdownHookManager: Shutdown hook called 16/12/06 06:02:51 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-08705c9f-0994-4da8-8b48-82eb9282313f `
There isn't enough information in the logs provided here to nail down the cause of the problem. Would you mind checking the fixes.sh file under the vagrant directory. In the fixes.sh file, look for a section called: # spark-defaults
If you don't see it, then you simply need to update the project from git and do a vagrant reload --provision like so:
git pull origin master
vagrant reload --provision
That should take care of it. The problem this fixes is that the spark defaults were too high for the very limited resources the VM is working with so that sections takes care of adding some defaults that should work for everyone and it results in editing the file /pluralsight/spark/conf/spark-defaults.conf
Closing this issue as original poster has this resolved now. @azzam-krya , if you're still having problems, ensure you're running the code as the root user. To get root, run the following.
sudo su -
If you have further problems, please open another ticket and kindly provide the following:
it works after I restart vm. thanks
I am having a similar problem as this one and raised this ticket for it today -as I didnt see this ticket previously.
https://github.com/aalkilani/spark-kafka-cassandra-applying-lambda-architecture/issues/27
Hi,
I failed to run the batch job on yarn in demo "Saving to HDFS and Executing on YARN", here's the error log.
16/12/01 11:56:16 INFO yarn.Client: client token: N/A diagnostics: Application application_1480592911767_0001 failed 2 times due to AM Container for appattempt_1480592911767_0001_000002 exited with exitCode: 1 For more detailed output, check application tracking page:http://lambda-pluralsight:8088/cluster/app/application_1480592911767_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1480592911767_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1480593276668 final status: FAILED tracking URL: http://lambda-pluralsight:8088/cluster/app/application_1480592911767_0001 user: vagrant
16/12/01 11:56:16 WARN yarn.Client: Failed to cleanup staging dir .sparkStaging/application_1480592911767_0001
java.net.ConnectException: Call From lambda-pluralsight/127.0.0.1 to lambda-pluralsight:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:1480) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2113) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:167) at org.apache.spark.deploy.yarn.Client.monitorApplication(Client.scala:977) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1031) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529) at org.apache.hadoop.ipc.Client.call(Client.java:1446) ... 31 more
Exception in thread "main" org.apache.spark.SparkException: Application application_1480592911767_0001 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/12/01 11:56:17 INFO util.ShutdownHookManager: Shutdown hook called 16/12/01 11:56:17 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9fe8fe45-851a-41ad-8409-3daf17e08a5d
And the log shows
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.
Thanks Gary