apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[SUPPORT] Error when selecting RT table on AWS Athena (0.14.1) - with custom Payload class #11725

Open Hfal91 opened 3 months ago

Hfal91 commented 3 months ago

Env: AWS EMR on EKS 7.1 Hudi version: 0.14.1 Athena engine: v3 Table mode: MOR

This happens ONLY if I use custom Payload class. When RT and RO table aren’t synced, selecting RT table in Athena returns the below error: _GENERIC_INTERNALERROR: Exception when constructing record reader This query ran against the “xxx database, unless qualified by the query. Please post the error message on our forum 

Initially, i though that i was doing something wrong in my custom code, but then, i created the class empty, just extending the default OverwriteWithLatestAvroPayload, and i continue getting the error, so at this point, seems that only the fact that i'm using a Custom payload class is provoking the error (?)

I'm providing the main code that i'm using, as well as the "empty" custom payload class: main_code_attach.txt custom_java.txt

To reproduce, do one execution with BATCH 1 (commented in the main file), and do a second execution with BATCH 2. After it, if you query the RT table in Athena, you should see the error GENERIC_INTERNAL_ERROR: Exception when constructing record reader

Hfal91 commented 3 months ago

Additionally, my current workaround is to run compaction at each commit: 'hoodie.compact.inline.max.delta.commits':'1'

Although this makes the RT table available for SELECT, it obviously loses its usability, as in this case the RT table is always synced with the RO table.