Closed AkshayChan closed 3 years ago
@AkshayChan From the message it seems to be pretty clear that some of the nodes are running out of disk space
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 4 times, most recent failure: Lost task 0.3 in stage 77.0 (TID 22943, 172.34.88.19, executor 32): com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device
I would recommend checking the EMR instances you are provisioning and logging into the boxes when the job is running to see when it's running out of space. To give you an idea of how this can happen, whenever Hudi performs an upsert, it will shuffle some data around. Spark shuffle has 2 phases : map and reduce. The map phase spills data to the local disk and uses the KryoSerializer to do so. That's is where you are running into this exception.
Not much I can do here. Let me know if you need anything.
@AkshayChan Haven't heard from you in a while, I'm closing this ticket since the resolution is outside of Hudi. Please re-open if you need more help.
I am trying to insert/update 15GB of data on a 165GB table, however I keep getting the following error
Here is my upsert config:
We are using the AWS Glue Connector for Apache Hudi through the AWS Glue Studio Marketplace
Hudi version : 0.5.3
Spark version : 2.4
AWS Glue version : 2.0
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Stacktrace