apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
874 stars 290 forks source link

[Spark][Improvement]: Support conflict check for Mixed-Hive format UnkeyedTable #1041

Open baiyangtx opened 1 year ago

baiyangtx commented 1 year ago

Search before asking

What would you like to be improved?

For mixed-format unkeyed table. there is no conflict check during committing, this may cause inconsistency of data.

How should we improve?

Add a conflict check when doing base store commit.

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

zhoujinsong commented 1 year ago

Does this improvement only affect the mixed-hive format unkeyed table? Do the mixed-iceberg format table and mixed-streaming keyed table need this improvement too?

baiyangtx commented 1 year ago

Does this improvement only affect the mixed-hive format unkeyed table? Do the mixed-iceberg format table and mixed-streaming keyed table need this improvement too?

@zhoujinsong

currently, the mixed-streaming keyed table writes to change store, so there is no conflict that should be done. for the mixed-iceberg format, the unkeyed table uses native iceberg cow as implementation, so we don't need to care about conflict checking.

In the future, the mixed-streaming keyed table will write to the base store and the mixed-iceberg table will not use native iceberg row-level operation implement, in that time, the writing conflict checking needs to be implemented.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.