Open ChaiBapchya opened 8 years ago
Aim - API call from any machine that submits a Spark job to Spark EC2 cluster Job runs perfectly well - Python file running on Localhost- Apache Spark However, unable to run it on Apache Spark EC2.
API call
curl -X POST http://ec2-54-209-108-127.compute-1.amazonaws.com:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{ "action" : "CreateSubmissionRequest", "appArgs" : [ "" ], "appResource" : "wordcount.py", "clientSparkVersion" : "1.5.0", "environmentVariables" : { "SPARK_ENV_LOADED" : "1" }, "mainClass" : "", "sparkProperties" : { "spark.jars" : "wordcount.py", "spark.driver.supervise" : "true", "spark.app.name" : "MyJob", "spark.eventLog.enabled": "true", "spark.submit.deployMode" : "cluster", "spark.master" : "spark://ec2-54-209-108-127.compute-1.amazonaws.com:6066" }}' { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20160712145703-0003", "serverSparkVersion" : "1.6.1", "submissionId" : "driver-20160712145703-0003", "success" : true }
To get the response, following API returns error - File not found
curl http://ec2-54-209-108-127.compute-1.amazonaws.com:6066/v1/submissions/status/driver-20160712145703-0003 { "action" : "SubmissionStatusResponse", "driverState" : "ERROR", "message" : "Exception from the cluster:\njava.io.FileNotFoundException: wordcount.py (No such file or directory)\n\tjava.io.FileInputStream.open(Native Method)\n\tjava.io.FileInputStream.<init>(FileInputStream.java:146)\n\torg.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124)\n\torg.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114)\n\torg.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:202)\n\torg.spark-project.guava.io.Files.copy(Files.java:436)\n\torg.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:539)\n\torg.apache.spark.util.Utils$.copyFile(Utils.scala:510)\n\torg.apache.spark.util.Utils$.doFetchFile(Utils.scala:595)\n\torg.apache.spark.util.Utils$.fetchFile(Utils.scala:394)\n\torg.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)\n\torg.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79)", "serverSparkVersion" : "1.6.1", "submissionId" : "driver-20160712145703-0003", "success" : true, "workerHostPort" : "172.31.17.189:59433", "workerId" : "worker-20160712083825-172.31.17.189-59433" }
Awaiting suggestions and improvements. p.s. - newbie in Apache Spark..
Update API call (Set the main class, appArgs, appResource, clientSparkVersion to updated value) ->
curl -X POST http://ec2-54-209-108-127.compute-1.amazonaws.com:6066/v1/submissions/create{ "action" : "CreateSubmissionRequest", "appArgs" : [ "/wordcount.py" ], "appResource" : "file:/wordcount.py", "clientSparkVersion" : "1.6.1", "environmentVariables" : { "SPARK_ENV_LOADED" : "1" }, "mainClass" : "org.apache.spark.deploy.SparkSubmit", "sparkProperties" : { "spark.driver.supervise" : "false", "spark.app.name" : "Simple App", "spark.eventLog.enabled": "true", "spark.submit.deployMode" : "cluster", "spark.master" : "spark://ec2-54-209-108-127.compute-1.amazonaws.com:6066" } }
Aim - API call from any machine that submits a Spark job to Spark EC2 cluster Job runs perfectly well - Python file running on Localhost- Apache Spark However, unable to run it on Apache Spark EC2.
API call
To get the response, following API returns error - File not found
Awaiting suggestions and improvements. p.s. - newbie in Apache Spark..
Update API call (Set the main class, appArgs, appResource, clientSparkVersion to updated value) ->