apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
849 stars 278 forks source link

[Bug]: OrphanFilesCleaning Action will never be executed because `TableProperties.ENABLE_ORPHAN_CLEAN` #3005

Closed huyuanfeng2018 closed 2 months ago

huyuanfeng2018 commented 3 months ago

What happened?

[Bug]: OrphanFilesCleaning Action never executes because TableProperties.ENABLE_ORPHAN_CLEAN is wrong

Affects Versions

master

What table formats are you seeing the problem on?

Iceberg

What engines are you seeing the problem on?

AMS

How to reproduce

Even though the ams configuration will clean-orphan-files.enabled=true , ams It will not trigger the action of cleaning up free files.

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

Code of Conduct

jiamin13579 commented 3 months ago

hi, can you provide detailed configuration information and table information?

huyuanfeng2018 commented 3 months ago

hi, can you provide detailed configuration information and table information?

TableProperties.ENABLE_ORPHAN_CLEAN=clean-orphan-files.enabled was correct, but now it's clean-orphan-file.enabled

zhoujinsong commented 2 months ago

Hi,

I am confused about the description of this issue. AFAIK, the configuration ams.clean-orphan-files.enabled=true in the AMS configuration file controls whether to enable orphan-file-clean process for all tables in the AMS. In fact, it controls whether to enable a thread pool to execute these tasks.

The configuration clean-orphan-file.enabled=false in the table properties controls whether to enable orphan-file-clean process on the table.

So the orphan-file-clean process will execute only when ams.clean-orphan-files.enabled=true and clean-orphan-file.enabled=true.

huyuanfeng2018 commented 2 months ago

Hi,

I am confused about the description of this issue. AFAIK, the configuration ams.clean-orphan-files.enabled=true in the AMS configuration file controls whether to enable orphan-file-clean process for all tables in the AMS. In fact, it controls whether to enable a thread pool to execute these tasks.

The configuration clean-orphan-file.enabled=false in the table properties controls whether to enable orphan-file-clean process on the table.

So the orphan-file-clean process will execute only when ams.clean-orphan-files.enabled=true and clean-orphan-file.enabled=true.

This statement seems to make sense, but this behavior does not seem to match other actions, such as clean-dangling-delete-files, which is clean-dangling-delete-files whether in the AMS configuration or in the table configuration. If this is the case we need to modify the clean-dangling-delete behavior to make it consistent

zhoujinsong commented 2 months ago

This statement seems to make sense, but this behavior does not seem to match other actions, such as clean-dangling-delete-files, which is clean-dangling-delete-files whether in the AMS configuration or in the table configuration. If this is the case we need to modify the clean-dangling-delete behavior to make it consistent

This is because the default value of clean-dangling-delete-files.enabled is true and the default value of clean-orphan-file.enabled is false.

You can add your default value for the catalog in the catalog table properties section.

huyuanfeng2018 commented 2 months ago

This statement seems to make sense, but this behavior does not seem to match other actions, such as clean-dangling-delete-files, which is clean-dangling-delete-files whether in the AMS configuration or in the table configuration. If this is the case we need to modify the clean-dangling-delete behavior to make it consistent

This is because the default value of clean-dangling-delete-files.enabled is true and the default value of clean-orphan-file.enabled is false.

You can add your default value for the catalog in the catalog table properties section.

Oh, my understanding was wrong. These two configurations are different.