Open HSLife1991 opened 5 months ago
mysql cdc default format will generate 2 record when upstream data updated, one record is delete one record is insert. maybe this is the reason why your data is duplicated.
and if you change the format to compatible_debezium_json
, it will only generate one update record.
You can change the sink to Console
then to check the result.
For your case, you use hive as destination, hive is not support update, delete operation. also cdc will generate a lots of small file. maybe it's not a good idea.
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
sync mysql data to hive by mysql cdc connector. 1.initial synced all the data and it's right in hive table; 2.changed some data in original mysql table or remove some records; 3.the dest hive contains duplicate record when change the mysql existed data;