Open Sheth-G opened 3 weeks ago
There are multiple commit time exists in hoodie table and also duplicated records exists when use insert overwrite into the target table. There are like 10 tables join in the query. :question: Ask torvalds button Create Ticket :ticket: button
Relevant Sources:
https://apache-hudi.slack.com/archives/C4D716NPQ/p1713562820086629
https://medium.com/@simpsons/different-query-types-with-apache-hudi-e14c2064cfd6
https://www.onehouse.ai/blog/hudi-metafields-demystified
https://api.github.com/repos/apache/hudi/issues/10780 >torvalds.dev is learning and improving. React below to provide feedback! :+1: 0 button :-1: 0 button
Could be fixed by xyz
No, fix by http://yzx.com|yzx.com
Describe the problem you faced
There are multiple commit times existing in the Hoodie table, and there are duplicated records when using insert overwrite into the target table. The query involves joining around 10 tables.
To Reproduce
Steps to reproduce the behavior:
1. 2. 3. 4.
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.