Closed prashanttct07 closed 9 years ago
This is the repo for examples from learning spark, your question is most likely best suited to the Apache Spark Users mailing list (very few people look at the issues here). Best of luck :)
Although at first look, it looks as performance degrade but is useful when spark jobs are submitted on gateway node since your driver program uses good amount of resources which is bottleneck in the future.
Kindly consider this as an inquiry if not an issue.
Hi , I am evaluating Spark to use here at my work.
We have an existing Hortonworks HDP 2.3 install.
I am trying to work out whether I should use local or client or cluster to submit a job in Spark.
Consider I am running my job as : sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "local[*]" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt
In this I am able to do the task in 14 Sec.
When I run the same like sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "yarn-client" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt
It takes 16 Second
And this one sudo -u hdfs spark-submit --class "org.xyz.Spark_ES_Java_V4" --master "yarn-cluster" target/xyz-1.1-jar-with-dependencies.jar 192.168.0.185 55555 > prashant.txt
Takes 18 Second.
As in first case I am running it locally means its running on one machine and taking less time where as in later caseI am submitting the job to cluster with 4 node.
So can anyone let me know what is the use of running the same in cluster as I am getting performance degrade with cluster. Or if any way is there where I can enhance the performance with cluster.
Would love to hear from someone regarding this very urgently.
~Prashant