apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.25k stars 2.39k forks source link

[SUPPORT] Failed to merge old record into new file for key xxx from old file 123.parquet to new file 456.parquet #1641

Closed HariprasadAllaka1612 closed 4 years ago

HariprasadAllaka1612 commented 4 years ago

Parquet schema changing for various writes to Hudi.

With the continuous writes to S3 in Hudi format, there are instance the schema of Paruet file is changing and when writing/upserting to same partition we are getting a merge error, I am using COW storage format.

To Reproduce

Steps to reproduce the behavior:

  1. Write the dataframe multiple times to same partition,

Expected behavior

  1. Same schema for all the parquet files

Environment Description

2020-05-19 21:06:56 ERROR BoundedInMemoryExecutor:130 - error consuming records
org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155 from old file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more
2020-05-19 21:06:59 ERROR HoodieCopyOnWriteTable:272 - Error upserting bucketType UPDATE for partition :1
org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
    at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
    ... 32 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
    ... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more
2020-05-19 21:06:59 ERROR Executor:91 - Exception in task 1.0 in stage 118.0 (TID 299)
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :1
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:273)
    at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
    ... 30 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
    ... 32 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
    ... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more
2020-05-19 21:06:59 ERROR TaskSetManager:70 - Task 1 in stage 118.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 118.0 failed 1 times, most recent failure: Lost task 1.0 in stage 118.0 (TID 299, localhost, executor driver): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :1
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:273)
    at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
    ... 30 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
    ... 32 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
    ... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1168)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:145)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
    at com.playngodataengg.dao.DataAccessS3.writeDataToRefinedHudiS3(DataAccessS3.scala:149)
    at com.playngodataengg.controller.LoginDataTransform.processData(LoginDataTransform.scala:368)
    at com.playngodataengg.action.LoginData$.main(LoginData.scala:16)
    at com.playngodataengg.action.LoginData.main(LoginData.scala)
Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :1
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:273)
    at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
    ... 30 more
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
    ... 32 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
    ... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49605000968507055240906198141791265848347950317232455682454c3447-193b-46df-9582-715f0ec61e4d from old file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_3-213-8449_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/18/f0f3b2c4-1864-4ee3-a16c-58cf1fd929f1-0_1-118-299_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more
2020-05-19 21:06:59 ERROR DataEngineering:12 - (writeDataToRefinedHudiS3) - There is an exception writing the data into data lake for login
2020-05-19 21:06:59 ERROR HoodieCopyOnWriteTable:272 - Error upserting bucketType UPDATE for partition :3
org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240 from old file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
    at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240 from old file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
    ... 32 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240 from old file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
    ... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446357933851667324507459552012140546f6643052-e862-41dc-a4cc-22150ef7a240 from old file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_0-213-8446_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/19/4300efe2-6ae2-4474-b5a5-ad758a93afd6-0_3-118-301_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more
2020-05-19 21:07:00 ERROR HoodieCopyOnWriteTable:272 - Error upserting bucketType UPDATE for partition :2
org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270 from old file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
    at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270 from old file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
    ... 32 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270 from old file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
    ... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:49607126967768344261446359161842247114459510822055968770b4bc494b-e50a-4118-86d6-efe500d13270 from old file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-213-8448_20200519162625.parquet to new file s3a://XYZ/reports/login/png/2020/05/19/67705781-2f8a-4a39-969d-2256cacc2b20-0_2-118-300_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more
2020-05-19 21:07:00 ERROR HoodieCopyOnWriteTable:272 - Error upserting bucketType UPDATE for partition :0
org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155 from old file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:208)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:183)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:265)
    at org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:457)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
    at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155 from old file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:148)
    at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206)
    ... 32 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155 from old file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
    ... 33 more
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key message_id:496071269677683442614463247120938275415648800229692538900aa07220-d2a1-4f87-82ed-1348bf6df155 from old file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_1-213-8447_20200519162625.parquet to new file s3a://XYZ/reports/login/dat/2020/05/18/e9bc50d6-2720-46a6-8e3a-6b72e998be1e-0_0-118-298_20200519210555.parquet
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:299)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:452)
    at org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:442)
    at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
    at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:126)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.Number
    at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:248)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:167)
    at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
    at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
    at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
    at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
    at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:294)
    ... 8 more

Process finished with exit code 0
vinothchandar commented 4 years ago

Looks like a schema mismatch.. did you change a number to a string for .eg?

vinothchandar commented 4 years ago

cc @lamber-ken @leesf any of you , interested in helping here? :)

HariprasadAllaka1612 commented 4 years ago

We can close this issue. This is a problem of having the parquet and hive table synced to parquet file having 2 different schemas. Its fixed by forcing the parquet schema always equal hive meta store,

Thank you.