airscholar / SparkingFlow

This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
https://www.datamasterylab.com/home/course/apache-airflow-on-steriods-for-data-engineers/9
24 stars 17 forks source link

Getting error in the python job #1

Open JyotinP opened 10 months ago

JyotinP commented 10 months ago

{base.py:73} INFO - Using connection ID 'spark-conn' for task execution. {spark_submit.py:351} INFO - Spark-Submit cmd: spark-submit --master spark://spark-master-1:7077 --name arrow-spark jobs/python/wordcountjob.py {spark_submit.py:521} INFO - /home//.local/lib/python3.11/site-packages/pyspark/bin/load-spark-env.sh: line 68: ps: command not found {spark_submit.py:521} INFO - /home//.local/lib/python3.11/site-packages/pyspark/bin/spark-class: line 71: /usr/lib/jvm/java-11-openjdk-arm64/bin/java: No such file or directory {spark_submit.py:521} INFO - /home/***/.local/lib/python3.11/site-packages/pyspark/bin/spark-class: line 97: CMD: bad array subscript {taskinstance.py:1935} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/apache/spark/operators/spark_submit.py", line 160, in execute self._hook.submit(self._application) File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/apache/spark/hooks/spark_submit.py", line 452, in submit raise AirflowException( airflow.exceptions.AirflowException: Cannot execute: spark-submit --master spark://spark-master-1:7077 --name arrow-spark jobs/python/wordcountjob.py. Error code is: 1.

franceZa commented 9 months ago

at Dockerfile add " RUN export JAVA_HOME" It works for me

Set JAVA_HOME environment variable ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/ RUN export JAVA_HOME

weldermartins commented 8 months ago

at Dockerfile add " RUN export JAVA_HOME" It works for me

Set JAVA_HOME environment variable ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/ RUN export JAVA_HOME

I did as you advised, but in my case I still got the same error.

[2024-01-16, 20:01:50 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 7 for task python_job (Cannot execute: spark-submit --master spark://spark-master:7077 --name arrow-spark --deploy-mode client jobs/python/wordcountjob.py. Error code is: 1.; 239)

image

python and java compatible image

Versions image

variables image

docker-compose.yml image

Airflow connection image

Airflow Job Error image

On my machine local, spark-submit ran without errors. image

yashraizb commented 1 month ago

@weldermartins Please check this issue link since this worked for me. And let me if this works for you.

yashraizb commented 1 month ago

@JyotinP It looks like ps command is not found, try adding procps in the dockerfile where java sdk and other packages are getting installed. Refer the below code:

RUN apt-get update && \ apt-get install -y gcc python3-dev openjdk-11-jdk procps && \ apt-get clean