apache / doris-flink-connector

Flink Connector for Apache Doris
https://doris.apache.org/
Apache License 2.0
292 stars 201 forks source link

[fix][cdc] support merging multiple databases with the same schema into a single Doris table during synchronization #385

Open vinlee19 opened 1 month ago

vinlee19 commented 1 month ago

Proposed changes

If some tables in different databases have the same name, their schemas will be merged and their records will be synchronized into one Doris table. Otherwise, each table's records will be synchronized to a corresponding Doris table, and the Doris table will be named to 'databaseName_tableName' to avoid potential name conflict. For example , if you have a customer_1 in db1, and another same table in db2 in mysql , if you don not set the option --database, db1 and db2 will be created in Doris. when set the option database test_db in following shell, all tables in mysql will sink one database in Doris. the records from the customer_1 and customer_2 in mysql will be synchronized in one Doris table customer_1. If you want tables with the same schema but in different databases to synchronize to the same Doris database with different table names, you can set the option --merge-same-schema false. The Doris tables will then be named test_db_customer_1 and test_db_customer_2, and the records will be loaded separately.

bin/flink run \
    -Dexecution.checkpointing.interval=10s \
    -Dparallelism.default=1 \
    -c org.apache.doris.flink.tools.cdc.CdcTools \
    lib/flink-doris-connector-1.16-1.6.1-SNAPSHOT.jar \
    mysql-sync-database \
    --database test_db \
    --mysql-conf hostname=127.0.0.1 \
    --mysql-conf port=3306 \
    --mysql-conf username=root \
    --mysql-conf password=123456 \
    --mysql-conf database-name="db.*" \
    --including-tables "tbl1|test.*" \
    --sink-conf fenodes=127.0.0.1:8030 \
    --sink-conf username=root \
    --sink-conf password=123456 \
    --sink-conf jdbc-url=jdbc:mysql://127.0.0.1:9030 \
    --sink-conf sink.label-prefix=label \
    --table-conf replication_num=1 

Issue Number: close #xxx

Problem Summary:

Describe the overview of changes.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

v-wx-v commented 1 month ago

set the option --merge-same-schema false. The Doris tables will then be named db1_customer_1 and db2_customer_1, is better?

vinlee19 commented 3 weeks ago

when set the option database test_db in following shell, all tables in mysql will sink one database in Doris. the records from the customer_1 and customer_2 in mysql will be synchronized in one Doris table customer_1. It is exactly like this.