Closed soumilshah1995 closed 1 year ago
@soumilshah1995 Your steps look good to me. Do you have the full stacktrace? org/objenesis/strategy/InstantiatorStrategy
is used during Kryo (de)serialization. It will happen in the path of reading/writing a delete block in MOR table. But, let's confirm through stacktrace. If it's usual ClassNotFound issue then it isn't what I am suspecting.
Well i am not sure exactly what logs you need deletes and job was successful but something in breaking due to which i cannot query the data
<html>
<body>
<!--StartFragment-->
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
--
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.]
<!--EndFragment-->
</body>
</html>
@lokeshj1703 let's try to reproduce this. @soumilshah1995 is this observed on master version? or 0.12.1?
Hi Version of Glue used is 4.0 glue 4.0 natively support HUDI i am not aware behind the scene which version Glue uses for Apache HUDI Here are steps in PDF https://drive.google.com/file/d/1mkED3AUZBARsgeRCzk0K0XMvSyajo7mQ/view https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html
Any Updates on this issue ?
Any updates @nsivabalan
@soumilshah1995 I am looking into the issue. Will be able to provide an update this week.
@soumilshah1995 I am looking into the issue. Will be able to provide an update this week.
Copy that :D
@soumilshah1995 I tried out the steps posted by you and ran into same issue with Athena after hard deletes.
hard_delete_df = spark.sql("SELECT * FROM $DB.$TABLE_rt")
print(hard_delete_df.show())
But on executing above query in the script, I was able to verify the results after hard delete. The result below is from the Output logs available after script is run.
+-------------------+--------------------+------------------+----------------------+--------------------+------+--------------------+-----------+-----+------+---+-----+----------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| _hoodie_file_name|emp_id| employee_name| department|state|salary|age|bonus| ts|
+-------------------+--------------------+------------------+----------------------+--------------------+------+--------------------+-----------+-----+------+---+-----+----------+
| 20221220072524704|20221220072524704...| 1| |1440cbb6-2a3d-44d...| 1| Justin Potts| HR| FL| 29051| 50|23794| 182319089|
| 20221220072524704|20221220072524704...| 9| |1440cbb6-2a3d-44d...| 9| Jonathan Curry| Sales| CA| 30139| 38|32531| 553278391|
| 20221220072524704|20221220072524704...| 5| |1440cbb6-2a3d-44d...| 5| Crystal Bailey| HR| CA| 67670| 33|73686| 979521992|
| 20221220072524704|20221220072524704...| 0| |1440cbb6-2a3d-44d...| 0| Ronald Knight| IT| CA|144358| 40|37355| 638291101|
| 20221220072524704|20221220072524704...| 8| |1440cbb6-2a3d-44d...| 8| Stephen Hayes| IT| RJ| 94273| 60|24935| 187328660|
| 20221220072524704|20221220072524704...| 7| |1440cbb6-2a3d-44d...| 7| Victor Delgado| HR| CA| 53309| 56|16874| 629223298|
| 20221220072630012|20221220072630012...| 3| |1440cbb6-2a3d-44d...| 3|this is update on...| Sales| RJ| 81000| 30|23000| 827307999|
| 20221220072524704|20221220072524704...| 2| |1440cbb6-2a3d-44d...| 2| Linda Hughes| Sales| RJ| 93822| 36|67099| 715415893|
| 20221220072524704|20221220072524704...| 6| |1440cbb6-2a3d-44d...| 6| Jerry Thomas| Marketing| TX| 72604| 42|61980| 211331663|
| 20221220072614391|20221220072614391...| 11| |1440cbb6-2a3d-44d...| 11| xxx| Sales| RJ| 81000| 30|23000| 827307999|
| 20221220072614391|20221220072614391...| 12| |1440cbb6-2a3d-44d...| 12| x change|Engineering| RJ| 79000| 53|15000|1627694678|
| 20221220072630012|20221220072630012...| 3| |1440cbb6-2a3d-44d...| 3|this is update on...| Sales| RJ| 81000| 30|23000| 827307999|
+-------------------+--------------------+------------------+----------------------+--------------------+------+--------------------+-----------+-----+------+---+-----+----------+
It seems like the issue is with query run using Athena.
@soumilshah1995 Can you check the Athena query issue with AWS support?
@soumilshah1995 Can you check the Athena query issue with AWS support?
Now that we now its issue can we have someone resolve it this is really important stuff if people want to do hard delete after performing hard delete it breaks on MOR tables
@nsivabalan @bhasudha looks like other have also confirmed about this issue
@soumilshah1995 AWS uses its own fork of Hudi and Presto for Athena.
jar tf hudi-presto-bundle-0.12.1.jar | grep -i InstantiatorStrategy
org/apache/hudi/com/esotericsoftware/kryo/Kryo$DefaultInstantiatorStrategy$1.class
org/apache/hudi/com/esotericsoftware/kryo/Kryo$DefaultInstantiatorStrategy$2.class
org/apache/hudi/com/esotericsoftware/kryo/Kryo$DefaultInstantiatorStrategy.class
org/apache/hudi/org/objenesis/strategy/InstantiatorStrategy.class
org/apache/hudi/org/objenesis/strategy/SingleInstantiatorStrategy.class
org/apache/hudi/org/objenesis/strategy/StdInstantiatorStrategy.class
org/apache/hudi/org/objenesis/strategy/BaseInstantiatorStrategy.class
org/apache/hudi/org/objenesis/strategy/SerializingInstantiatorStrategy.class
As you can see, the InstantiatorStrategy classes are shaded in the Hudi Presto bundle. But the error shown by Athena relates to an unshaded class GENERIC_INTERNAL_ERROR: org/objenesis/strategy/InstantiatorStrategy
.
So what would be solution here ?
Any Updates here
@soumilshah1995 We are still trying to root cause this w.r.t AWS. Will update here.
Roger that captain :D by the way i am big time hudi Lover ;D
Any Updates
Hi do we have any updates for this Task ?
Have check it with hudi-spark3.3-bundle_2.12-0.12.2.jar the same problem appears, so RT table is not usable anymore, and the row can be still readed in RO table.
Any updates on this issue ? @nsivabalan
@soumilshah1995 @Witekkq We are discussing with the AWS team and it can take some time to get to a resolution here.
Hi @soumilshah1995 would you mind creating an AWS support issue for this? That will accelerate the resolution from AWS Athena.
Sure i will tell my company sysops to create support ticket :D
Sure i will tell my company sysops to create support ticket :D
Appreciate that! Let us know the AWS support ticket number once it's filed. cc @umehrot2
Sure i will tell my company sysops to create support ticket :D
Appreciate that! Let us know the AWS support ticket number once it's filed. cc @umehrot2
@soumilshah1995 do you have the support ticket number? cc @umehrot2
Hi i think we should @yihua he has already a ticket i assume i did send email to AWS Rep they said they will look into it i assume if many people file tickets then they can prioritize this Issues
@soumilshah1995 What was the version of the hudi table?
@soumilshah1995 what version of the athena query engine are you using and what version of hudi is the table you wrote too. It seems that the latest version of hudi that athena is using is 0.10.1 for query engine v3. Can you try creating a hudi table with 0.10.1 and make sure that the query engine athena uses is v3, you can a add new workgroup with a v3 engine in athena console. If you need help feel free to ping me on hudi slack.
i will retry this weekend and let you know if that works :D stay tuned
Hello All Hope yuou are doing well and I wanted to report this as a BUG on MOR with Apache HUDI. Here is what happened when I selected MOR and created apache HUDI table performed append and updates things worked perfectly alright. I saw two tables creates ro(read optimized tables ) and RT(Real time ) which has all latest commits. Now I was able to query both the table using Athena and things break when I started perfoming hard delete. I inserted some sample data into data lake emp 0 ,1,3,4,5,6,7 and now I deleted the emp_4 delete was successful but then I am no longer able to query my RT tables using Athena and shows following error
Test 1 (NO DELETES DONE SO FAR) APPEND AND UPDATES
Deletes
Code
if i am doing something wrong happy to change it i think with hard deletes when i performed i cannot query the data now on athena
Error GENERIC_INTERNAL_ERROR: org/objenesis/strategy/InstantiatorStrategy
Glue Version 4
Steps how i configured Glue Job
https://drive.google.com/file/d/1mkED3AUZBARsgeRCzk0K0XMvSyajo7mQ/view