Closed lewis262626 closed 2 years ago
@lewis262626 While i try to reproduce the steps, could you try specifying the confs before the main class?
park-submit --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.hive.convertMetastoreParquet=false" --jars /usr/lib/hudi/hudi-spark-bundle.jar hudi.py
@rmahindra123 That fixed it, cheers.
Closing
Please help @rmahindra123 . Getting same error while running in Glue Notebook. I have specified conf files using %spark_conf magic
I am also facing the same issue in AWS EMR serverless.
--conf spark.jars=/usr/lib/hudi/hudi-utilities-bundle.jar,/usr/lib/hudi/hudi-spark-bundle.jar
To Reproduce
Steps to reproduce the behavior:
spark-submit hudi.py --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.hive.convertMetastoreParquet=false" --jars /usr/lib/hudi/hudi-spark-bundle.jar
I have also tried saving with
inputDF.write.format('hudi')
, but I still get the same errorExpected behavior
Spark saves data as Hudi in S3
Environment Description
Hudi version : 0.10.1-amzn-0
Spark version : 3.2.0
Hive version : 3.1.2
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : S3
Running on Docker? no
Additional context
Add any other context about the problem here.
Stacktrace
Stack trace: