Closed kremrikpatel closed 1 year ago
@kremrikpatel Thanks for detailed explanation. Currently i am working on it, We can expect new release in few weeks.
Please check the new [release V2.0] (https://github.com/awslabs/amazon-s3-tagging-spark-util/releases/tag/v2.0) page and README.md. Thanks for your patience.!
Closing this issue.
Hi,
I am running spark jobs on glue 3.0 with pyspark, and spark tagging util jar file download from release page https://github.com/awslabs/amazon-s3-tagging-spark-util/releases , amazon-s3-tagging-spark-util-assembly_2.12-1.0.jar. I am passing the jar as external argument of glue job as "--extra-jars" : "s3://$BUCKET/$PREFIX/amazon-s3-tagging-spark-util-assembly_2.12-1.0.jar".
glue start job command : $ aws glue start-job-run --job-name "CSV to CSV" --arguments='--extra-jars="s3://$BUCKET/$PREFIX/amazon-s3-tagging-spark-util-assembly_2.12-1.0.jar"'
The jar register successfully in glue job , able to see the jars in spark config ('spark.glue.extra-jars', 's3://$BUCKET/$PREFIX/amazon-s3-tagging-spark-util-assembly_2.12-1.0.jar')
First I am try to reading files from s3 bucket, and reading file successfully.
df=spark.read.csv('s3://file',header=True,inferschema=True)
and then after Writing the file back to s3
df.write .format("s3.csv") .option("tag", "{\"ProjectTeam\": \"Team-A\", \"FileType\":\"parquet\"}") .save("s3://$DATA_BUCKET/$TABLE_NAME")
But getting error during the write the file: File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328 in get_return_value format(target_id,".",name),value)
py4j.protocol.Py4JJavaError: An error occurred while calling o165.save : java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/csv/CSVOptions
Can you please help me out on this?