Open melin opened 1 month ago
Hi, To minimize the size of the Spark image while adding delta-spark, I suggest we consider:
Using a lighter base image. Installing only the necessary dependencies instead of the entire pyspark. Implementing multi-stage builds to keep only essential files. Cleaning up temporary files and caches after installations.
To install the delta-spark python package on the spark image, you need to download pyspark.zip. pyspark.zip has more than 370 MB. Can I avoid increasing the size of the spark image?