apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[Feature] The Kafka-CDC whole-database synchronization feature does not support specifying multi-level partition fields for each table #3633

Open lipeng186 opened 4 days ago

lipeng186 commented 4 days ago

Search before asking

Motivation

In this case, Kafka-CDC is not suitable for tables with large data volumes. Because there are many tables during the synchronization of the whole database, and the source table may be modified, we do not want to create the target table in advance. We want Kafka-CDC to support specifying the multi-level partitioning fields and the number of buckets for each table in the parameters: eg: --table_conf partitions.tableName1.fields=col1,col2,col3 --table_conf partitions.tableName2.fields=col1,col2,col3 --table_conf bucketKey.tableName1.fields=col1,col2,col3 --table_conf bucketKey.tableName2.fields=col1,col2,col3 --table_conf bucketNum.tableName1=4 --table_conf bucketNum.tableName2=6

Solution

No response

Anything else?

No response

Are you willing to submit a PR?