Closed hsrasheed closed 6 years ago
@hr00 Thanks for the feedback! We are currently investigating and will update you shortly.
@hr00 What message does the job fail with?
@hr00 I see references to the Livy batches API and python files here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/livy-api-batch.html
@nitinme could you consider this a request to add an example of how to make a Livy API call to execute a python file?
@jason-j-MSFT We have used that API to create the requests and have successfully done so for session requests and statements. But when running batch jobs we see exceptions in the application log like the following:
Application application_1526496305873_0051 failed 5 times due to AM Container for appattempt_1526496305873_0051_000005 exited with exitCode: -1000 For more detailed output, check the application tracking page: http://[URL].cx.internal.cloudapp.net:8088/cluster/app/application_1526496305873_0051 Then click on links to logs of each attempt. Diagnostics: File/Folder does not exist: /clusters/[clustername]/user/livy/.sparkStaging/application_1526496305873_0051/pyspark.zip [0460e181-84d4-45a7-903e-31258cf7946d][2018-05-31T09:06:24.2370618-07:00] [ServerRequestId:0460e181-84d4-45a7-903e-31258cf7946d] java.io.FileNotFoundException: File/Folder does not exist: /clusters/[clustername]/user/livy/.sparkStaging/application_1526496305873_0051/pyspark.zip [0460e181-84d4-45a7-903e-31258cf7946d][2018-05-31T09:06:24.2370618-07:00] [ServerRequestId:0460e181-84d4-45a7-903e-31258cf7946d] at sun.reflect.GeneratedConstructorAccessor81.newInstance(Unknown Source)
@hr00 you might want to check it here: https://issues.apache.org/jira/browse/SPARK-10795
@hr00 Please let us know if you need further assistance, or if the tips provided led to a solution. Thanks! Jason
We haven't heard back lately, and not sure if this issue is still active. Let us know if we need to continue the discussion.
Azure support could also assist if you are stuck.
Summary: The doc recommended to use the POST /batches and the pyFiles element in the body. pyFiles | Python files to be used in this session | list of strings
Looks like there are a number of quirks with pyspark.zip per the jira noted.
Not sure if this scenario is hitting a bug, or something misconfigured on the cluster, or a problem in the body of the rest call.
Is it possible to submit a Livy Spark Batch job that references a python file instead of a jar file? I have tried something of the following, but the job fails: curl -k --user "user:pwd" -v -H "Content-Type: application/json" -X POST --data "{"file": "wasb://test@mystorage.blob.core.windows.net/livybatchtest.py"}" https://mycluster.azurehdinsight.net/livy/batches | python -m json.tool
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.