Job remains in ACCEPTED state after submitting in spark

ashishmishraw commented 7 years ago

Hi,

I have ran the following command on the Virtualbox VM for submitting the batch job in spark as explained in video:

Did the following steps on my Mac OSX:

vagrant ssh
cd /pluralsight/spark ; HADOOP_CONF_DIR is set to "/pluralsight/hadoop_conf"
Ran command " ./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar"
state of submitted job remains accepted only

./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar 16/12/13 14:36:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/12/13 14:36:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/12/13 14:36:55 INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers 16/12/13 14:36:55 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 16/12/13 14:36:55 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 16/12/13 14:36:55 INFO yarn.Client: Setting up container launch context for our AM 16/12/13 14:36:55 INFO yarn.Client: Setting up the launch environment for our AM container 16/12/13 14:36:56 INFO yarn.Client: Preparing resources for our AM container 16/12/13 14:36:56 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/spark/spark-assembly-1.6.1-hadoop2.6.0.jar 16/12/13 14:36:56 INFO yarn.Client: Uploading resource file:/vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar -> hdfs://lambda-pluralsight:9000/user/vagrant/.sparkStaging/application_1481639716980_0001/spark-lambda-1.0-SNAPSHOT-shaded.jar 16/12/13 14:37:00 INFO yarn.Client: Uploading resource file:/tmp/spark-28c1f1e5-d3df-4afc-8311-f966b490c8a0/spark_conf75654060557689293.zip -> hdfs://lambda-pluralsight:9000/user/vagrant/.sparkStaging/application_1481639716980_0001/spark_conf75654060557689293.zip 16/12/13 14:37:00 INFO spark.SecurityManager: Changing view acls to: vagrant 16/12/13 14:37:00 INFO spark.SecurityManager: Changing modify acls to: vagrant 16/12/13 14:37:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vagrant); users with modify permissions: Set(vagrant) 16/12/13 14:37:00 INFO yarn.Client: Submitting application 1 to ResourceManager 16/12/13 14:37:01 INFO impl.YarnClientImpl: Submitted application application_1481639716980_0001 16/12/13 14:37:02 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED) 16/12/13 14:37:02 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1481639821142 final status: UNDEFINED tracking URL: http://lambda-pluralsight:8088/proxy/application_1481639716980_0001/ user: vagrant 16/12/13 14:37:03 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED) 16/12/13 14:37:04 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED) 16/12/13 14:37:05 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED)

Like this the job remains in ACCEPTED state forever. I had waited for 15 mins max

Also, when I opened the following link -> http://127.0.0.1:8088/cluster/scheduler?openQueues=default I see the below information

Queue State:    RUNNING
Used Capacity:  0.0%
Absolute Used Capacity:     0.0%
Absolute Capacity:  100.0%
Absolute Max Capacity:  100.0%
Used Resources:     <memory:0, vCores:0>
Num Schedulable Applications:   1
Num Non-Schedulable Applications:   0
Num Containers:     0
Max Applications:   10000
Max Applications Per User:  10000
Max Application Master Resources:   <memory:0, vCores:0>
Used Application Master Resources:  <memory:2048, vCores:1>
Max Application Master Resources Per User:  <memory:0, vCores:0>
Configured Capacity:    100.0%
Configured Max Capacity:    100.0%
Configured Minimum User Limit Percent:  100%
Configured User Limit Factor:   1.0
Accessible Node Labels:     *
Preemption:     disabled

Not sure what is wrong. Please help me to proceed

~Ashish

aalkilani commented 7 years ago

If you go to the YARN UI http://lambda-pluralsight:8088/

There's a section up top called Cluster Metrics. How many Active Nodes is it reporting? If it's reporting 0 active nodes then you have an old version of the VM and this has been addressed in a prior release.

You can either get the latest vagrant image which has all the updates but requires more bandwidth to download: (Be careful, you will loose any data you have saved outside of the /vagrant directory if you do this)

vagrant box update
vagrant up

OR, you can chose to simply get latest code from git and it will handle the upgrade process for you and requires less network bandwidth:

Use Cygwin or whatever other shell you're using to get the latest from git. First navigate to the directory spark-kafka-cassandra-applying-lambda-architecture/vagrant

This is on your own host machine (not inside the VM)

git pull origin

You should see some updates.

Re-provision the vagrant image

vagrant provision

Let me know how it goes.

aalkilani commented 7 years ago

Closing issue for no activity. I am assuming the latest version has resolved this for you. Please re-open if this remains to be a problem. Thanks

aalkilani / spark-kafka-cassandra-applying-lambda-architecture

Job remains in ACCEPTED state after submitting in spark #11