[SUPPORT] Why HUDI ConsistentBucketClusteringExecutionStrategy not supported by flink engine?

pursuit-wangpz commented 1 month ago

Upon reviewing the source code, it is evident that the ConsistentBucketClusteringExecutionStrategy is only implemented for the Spark engine.

danny0405 commented 1 month ago

Because it's hard for Flink to support both compaction and clustering execution in the same pipeline, current Flink only supports the clustering plan generation for consistnet hashing, a separate clustering job is needed for execution.

pursuit-wangpz commented 1 month ago

Because it's hard for Flink to support both compaction and clustering execution in the same pipeline, current Flink only supports the clustering plan generation for consistnet hashing, a separate clustering job is needed for execution.

However, it seems that org.apache.hudi.sink.clustering.HoodieFlinkClusteringJob does not support ConsistentBucketClusteringExecutionStrategy, which can only be specified with the Spark engine using org.apache.hudi.client.clustering.run.strategy.SparkConsistentBucketClusteringExecutionStrategy. This operation implies that HUDI requires two engines to complete the Consistent Bucket operation: the Flink engine to generate the plan, and the Spark engine to execute the plan.

danny0405 commented 1 month ago

I think so, @beyond1920 can you chim in for more insights?

apache / hudi

[SUPPORT] Why HUDI ConsistentBucketClusteringExecutionStrategy not supported by flink engine? #11636