VinhDevNguyen / end2end_datapipeline_project

1 stars 0 forks source link

Resolving PySpark Python Version Mismatch: Aligning Driver and Worker Environments #9

Closed VinhDevNguyen closed 2 months ago

VinhDevNguyen commented 2 months ago

Here's the revised version with improved grammar:

"I encountered this error message when attempting to run PySpark code on Python 3.10, while the Python version on the worker node is 3.11:

pyspark.errors.exceptions.base.PySparkRuntimeError: [PYTHON_VERSION_MISMATCH] Python in worker has a different version (3, 11) than that in the driver (3.10). PySpark cannot run with different minor versions.
Please ensure that the environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly configured.

To resolve this issue, we need to change the Python version from 3.12 to 3.11 to match the version used on the worker node. Refer to the Dockerfile on GitHub for details: https://github.com/VinhDevNguyen/end2end_datapipeline_project/blob/6145676ec4ac3e43e83ffb2d628b382393c96ece/python-application/dockerfile#L1

ShinVu commented 2 months ago

This python application service was initially created to test scripts for interacting with PostgreSQL (For simulating CRUD actions). It should not be used to interact with Spark.