aws-samples / emr-serverless-samples

Example code for running Spark and Hive jobs on EMR Serverless.
https://aws.amazon.com/emr/serverless/
MIT No Attribution
155 stars 77 forks source link

providing extra jar file using --jars is not working for pyspark jobs #64

Open shaleena opened 6 months ago

shaleena commented 6 months ago

Trying to pass a jar dependency file with spark submit as mentioned in the examples for pyspark jar is not working

"sparkSubmitParameters": "--jars=s3://<data-integration-artifacts>/spark-credentials-provider/idealo-spark-credentials-provider-1.3.0.jar",

Also noticed that the jar doesn't end up on "spark.driver.extraClassPath" or "spark.executor.extraClassPath" even if it's downloaded from s3 into tmp folder

Files s3://<data-integration-artifacts>/spark-credentials-provider/idealo-spark-credentials-provider-1.3.0.jar from /tmp/spark-112c49ee-7811-43bf-82ee-587a2d188f19/idealo-spark-credentials-provider-1.3.0.jar to /home/hadoop/./idealo-spark-credentials-provider-1.3.0.jar Tried the above with both EMR version 7.0.0 and 6.14.0

copying the jar to /usr/share/aws/emr/emrfs/auxlib/ with a Docker build worked.

Is this a known issue and is there any solution to fix this without using Docker ?