An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
I am in the last steps of the project and when I do spark-submit I got cassandra module not found error. I have checked all the jars and cassandra-driver version all our correct. I am using python 3.9 spark 3.5.1 scala 2.2.
I am in the last steps of the project and when I do spark-submit I got cassandra module not found error. I have checked all the jars and cassandra-driver version all our correct. I am using python 3.9 spark 3.5.1 scala 2.2.
Can anyone please help me.