apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
849 stars 278 forks source link

[Improvement]: Optimize the upsert mode in the stream ingestion scenario and reduce redundant deleted records in data files #964

Open YesOrNo828 opened 1 year ago

YesOrNo828 commented 1 year ago

Search before asking

What would you like to be improved?

Arctic already supports an upsert table in the stream pipeline, Flink writer would write a delete record into delete files before writing each inserted record into insert files. This causes many redundant deleted records in the deleted data files, slowing down the OLAP query.

How should we improve?

No response

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

majin1102 commented 1 year ago

can you explain your idea of improvement?

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.