apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[SUPPORT] hudi 0.12 spark batch ingestion throw out archive format validation error #8382

Open qingyuan18 opened 1 year ago

qingyuan18 commented 1 year ago

Tips before filing an issue

Describe the problem you faced Hudi 0.12 ,spark 3.2, writeToHudiByNoPartition throw out exception: image

seems like it doesn't recognize the hudi's archive metadata avro format

To Reproduce

Steps to reproduce the behavior:

  1. read datasource with spark dataframe

  2. config hudi write parameter as following image

  3. run the spark app with writeHudi function: writeToHudiByPartition( df2, sinkTable, sink_alliances_table_key, sink_alliances_distinct_field, "date_part", hiveDB, save_path)

  4. after 30 commits, which trigger the archive process, it throw out the exception as ahead

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

danny0405 commented 1 year ago

Did you ever try to write to a legacy table? It seems a version compatibility.

qingyuan18 commented 1 year ago

no , i have cleaned up table dataand re run the job .  th error still reproduce

---Original--- From: "Danny @.> Date: Wed, Apr 5, 2023 13:48 PM To: @.>; Cc: @.**@.>; Subject: Re: [apache/hudi] [SUPPORT] hudi 0.12 spark batch ingestion throw outarchive format validation error (Issue #8382)

Did you ever try to write to a legacy table? It seems a version compatibility.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

danny0405 commented 1 year ago

Did you also clean the .hoodie/archive folder ?

qingyuan18 commented 1 year ago

yes, indeed

---Original--- From: "Danny @.> Date: Wed, Apr 5, 2023 14:02 PM To: @.>; Cc: @.**@.>; Subject: Re: [apache/hudi] [SUPPORT] hudi 0.12 spark batch ingestion throw outarchive format validation error (Issue #8382)

Did you also clean the .hoodie/archive folder ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

danny0405 commented 1 year ago

It looks like a version compatibility issue, in old version Hudi, the archived entry does not have field: operationType.

ad1happy2go commented 1 year ago

@qingyuan18 Were you able to resolve this issue? If yes can you share the resolution please.