dogukannulu / kafka_spark_structured_streaming

Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
122 stars 51 forks source link

Issue while running the DAG #5

Open Abhi3linku opened 3 weeks ago

Abhi3linku commented 3 weeks ago

I am following kafka_spark_structured_streaming repo and try to play around the details. However I am getting the Kafka not found error. I have checked the docker image of Apache Airflow and I can see the kafka is part of the requirements.txt file. However I am not sure why I am seeing this error. Please advise in case I am missing anything from my end. Broken DAG: [/usr/local/airflow/dags/stream_to_kafka_dag.py] No module named 'kafka' @dogukannulu Please advise.

elaiken3 commented 3 weeks ago

@dogukannulu I am also having this same issue. Please advise on how to fix. Thanks.

hkaanengin commented 2 weeks ago

Hey @Abhi3linku and @elaiken3 ,

I encountered a similar issue with the requirments. I don't know if there is a fix around without changing the current structure. However, I can propose you an alternative approach.

You can initialize airflow with an entrypoint, which is a .sh script, given to the airflow image. Please take a look at my code here: docker-compose-infra.

I created my own basic airflow docker image and started it with an entrypoint, which looks like this: entrypoint.sh. This initialized airflow with the requirements you need for the dag.

Sorry again, @dogukannulu, that I promoted my work on your repository. My projects credit is all yours :)

Hope you find it helpful.