apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.24k stars 2.17k forks source link

Flink: Maintenance - Add support for more kinds of scheduling #11246

Open netvl opened 2 days ago

netvl commented 2 days ago

Feature Request / Improvement

Current implementation of maintenance operations scheduling provides several options:

It would be great if there was a way to use more powerful scheduling options, for example, for a cron-like scheduler where I can specify that I want to run this particular maintenance operation on 15th minute of every hour or something like it.

Note that external scheduling does not work for us really, because we don't have a simple way to schedule batch Flink applications (and wouldn't want to, as it will require a whole separate application, which is precisely what we want to avoid).

In general, it would be great to have a hook into the scheduling system to provide a custom source of trigger events. This will also help with testing of maintenance-based features of our application; for example, with a custom scheduler, I could've been able to use an external trigger (e.g. an HTTP call, or a blocking queue notification) to fully control when maintenance operations are executed.

Query engine

Flink

Willingness to contribute

netvl commented 2 days ago

cc @pvary

pvary commented 2 days ago

Cc: @stevenzwu, @rodmeneses

@netvl: The scheduling currently doesn't guarantee the exact time of the execution. If there is a concurrent maintenance run then it will wait until the concurrent run has finished.

Would this restriction acceptable for your use case?