Closed Evgeny-Larin closed 5 months ago
The problem was the difference between the versions of spark in dockerfile and docker compose file It turns out that with this type of deployment, spark in airflow (installed by us using dockerfile) is used as a master for the spark cluster from the docker compose file I found information about this here: link
Hello! I configured spark in a Docker compose file according to the instructions from the video and edited the Dockerfile as suggested by weldermartins in the previous issue
![image](https://github.com/airscholar/SparkingFlow/assets/118234834/9f6c9d08-9ed7-44b6-83f2-7e2e8147037b)
![image](https://github.com/airscholar/SparkingFlow/assets/118234834/16b0e350-2e5b-4f9a-b3d4-6ae322b65b57)
![image](https://github.com/airscholar/SparkingFlow/assets/118234834/55d2a030-0026-4183-b451-fe01485fa212)
When I launch SparkSumbitOperator (or BashOperator with
spark-submit --master spark://spark-master:7077 test_spark_job.py
command) in airflow, I see the creation of a job in the spark master and the execution stuck.If you open the airflow logs you can see the following text:
if I open stderr in the worker interface, I see the following logs:
I also noticed that if in the executed code we leave only the lines that receive the spark session configuration, then it works fine
![image](https://github.com/airscholar/SparkingFlow/assets/118234834/3d00adf2-4233-4544-a67a-786aeb46d882)
also if I go directly into the spark container and run command
![image](https://github.com/airscholar/SparkingFlow/assets/118234834/e753673f-c2ba-4c44-8ea2-6432ef8a5ef1)
spark-submit --master spark://spark-master:7077 test_spark_job.py
, the task will complete successfullyHas anyone encountered this problem? I looked all over the internet and couldn't find the answer.
Here is my code link