apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

support aliyun-json #4559

Open JackeyLee007 opened 1 day ago

JackeyLee007 commented 1 day ago

[flink]

Purpose

Linked issue: close #4529

To support the json format of aliyun Data Integration, DI in short. The data in kafka collected by DI from mysql or oracle, are not in standard format. Not canal-json or debezium json. I would like call it aliyun-json.

We want to sink the data to paimon from kafka directly, with the kafka_sync_database/table action. So the aliyun-json must be supported, and parsed into cdc record.

Tests

Supplied with the commit.

API and Format

No.

Documentation

When used to process the data collected by ID, with paimon-flink-action, it just need to specify the json format as the following:

... paimon-flink-action-<version>.jar
kafka_sync_database 
...
--kafka_conf value.format=aliyun-json 
...