apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
849 stars 278 forks source link

[Improvement]: More detail for table metrics #2400

Closed zhongqishang closed 1 month ago

zhongqishang commented 10 months ago

Search before asking

What would you like to be improved?

image

image

Current Base table metric only have File count/ Total size / Average File Size Statistics are mixed data file and delete file.

How should we improve?

Are you willing to submit PR?

Subtasks

No response

Code of Conduct

wangtaohz commented 10 months ago

There are many details to consider.

  1. separate parameter designs may be necessary for different Table Formats.
  2. when users are using the Iceberg Table in an append scenario, there are no delete files, so it's better not to display delete-related metrics at that time.
  3. Change Store and Base Store of Mixed Format also need separate designs

BTW, we can access these detailed metrics information on the snapshots page.

image
wangtaohz commented 10 months ago

Add eq delete count ratio for data count Add pos delete count ratio for data count

@zhongqishang I'm a little curious about the real requirements for displaying these ratios :)

zhongqishang commented 10 months ago

There are many details to consider.

  1. separate parameter designs may be necessary for different Table Formats.

Each format requires a different design, or even none display. The original idea came from iceberg native format.

  1. when users are using the Iceberg Table in an append scenario, there are no delete files, so it's better not to display delete-related metrics at that time.

Displaying 0 without delete is fine.

  1. Change Store and Base Store of Mixed Format also need separate designs BTW, we can access these detailed metrics information on the snapshots page.

Yes, All this information can be found on the page, but it is not intuitive enough.

Add eq delete count ratio for data count Add pos delete count ratio for data count

@zhongqishang I'm a little curious about the real requirements for displaying these ratios :)

The query results need to be merged with delete files. For some abnormal situations, the number of deletes has a very intuitive reflection on the query analysis of the analysis table.

For example, a larger self-optimizing.major.trigger.duplicate-ratio is configured or the compaction of eq delete is not completed in time.

@wangtaohz

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 1 month ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'