Sheth-G / test-repo-app

MIT License
0 stars 0 forks source link

[SUPPORT] Issue with multiple commit times and duplicated records during insert overwrite operation with multiple tables join #31

Open Sheth-G opened 3 weeks ago

Sheth-G commented 3 weeks ago

Describe the problem you faced

There are multiple commit times existing in the Hoodie table, and there are duplicated records when using insert overwrite into the target table. The query involves joining approximately 10 tables.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 3 weeks ago

There are multiple commit time exists in hoodie table and also duplicated records exists when use insert overwrite into the target table. There are like 10 tables join in the query. :question: Ask torvalds button Create Ticket :ticket: button

torvalds-dev-testbot[bot] commented 3 weeks ago

Relevant Sources:

https://apache-hudi.slack.com/archives/C4D716NPQ/p1713562820086629

https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java

https://medium.com/@simpsons/different-query-types-with-apache-hudi-e14c2064cfd6

https://www.onehouse.ai/blog/hudi-metafields-demystified

https://api.github.com/repos/apache/hudi/issues/10780 >torvalds.dev is learning and improving. React below to provide feedback! :+1: 0 button :-1: 0 button

Sheth-G commented 3 weeks ago

This can be solved by xyz

torvalds-dev-testbot[bot] commented 3 weeks ago

No, it rather needs to follow http://apache.com/documentation|apache.com/documentation