apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
860 stars 284 forks source link

[Improvement]: Mixed Format KeyedTable Self-Optimizing supports bin-packing tasks after splitting tasks by nodes #2438

Closed wangtaohz closed 1 month ago

wangtaohz commented 10 months ago

Search before asking

What would you like to be improved?

The Mixed Format KeyedTable splits tasks based on nodes (hash). If the number of split tasks is insufficient, it will lead to inadequate concurrent utilization by the optimizer, resulting in a longer process execution time.

image

How should we improve?

We should use bin-packing to split the files in one node into several tasks when it's necessary.

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 1 month ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'