apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 956 forks source link

[Feature] Add AWS DMS CDC format support #4432

Closed Moonlight-CL closed 2 weeks ago

Moonlight-CL commented 2 weeks ago

Search before asking

Motivation

AWS Database Migration Service (DMS) is a service that can perform homogeneous and heterogeneous database migration, it can migrate data and replicate ongoing changes, which helps to build data lakes and perform real-time processing on change data from your data stores.

By adding support of DMS CDC data format, developers can build streaming data lake by streaming CDC data to Kafka and ingesting data into data lake with Apache Paimon table format, and this is awesome.

The DMS CDC data format is similarly like with Maxwell, you can find the detail about the format and using kafka as DMS target from this link, Here is the detail DMS JSON format:

RecordType The record type can be either data or control. Data records represent the actual rows in the source. Control records are for important events in the stream, for example a restart of the task.

Operation For data records, the operation can be load, insert, update, or delete.

For control records, the operation can be create-table, rename-table, drop-table, change-columns, add-column, drop-column, rename-column, or column-type-change.

SchemaName The source schema for the record. This field can be empty for a control record.

TableName The source table for the record. This field can be empty for a control record.

Timestamp The timestamp for when the JSON message was constructed. The field is formatted with the ISO 8601 format.

The following JSON message example illustrates a data type message with all additional metadata.

{ 
   "data":{ 
      "id":100000161,
      "fname":"val61s",
      "lname":"val61s",
      "REGION":"val61s"
   },
   "metadata":{ 
      "timestamp":"2019-10-31T22:53:59.721201Z",
      "record-type":"data",
      "operation":"insert",
      "partition-key-type":"primary-key",
      "partition-key-value":"sbtest.sbtest_x.100000161",
      "schema-name":"sbtest",
      "table-name":"sbtest_x",
      "transaction-id":9324410911751,
      "transaction-record-id":1,
      "prev-transaction-id":9324410910341,
      "prev-transaction-record-id":10,
      "commit-timestamp":"2019-10-31T22:53:55.000000Z",
      "stream-position":"mysql-bin-changelog.002171:36912271:0:36912333:9324410911751:mysql-bin-changelog.002171:36912209"
   }
}

Solution

Provider an AWS DMS CDC format implementation in Apache Paimon.

Anything else?

No response

Are you willing to submit a PR?