apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.34k stars 2.42k forks source link

[SUPPORT] flink cdc 3.x pipeline hudi sink #11953

Open melin opened 1 week ago

melin commented 1 week ago

flink cdc 3.0 pipeline hudi sink

danny0405 commented 1 week ago

Hi, does Flink cdc 3.0 requires the sink to support specific APIs for SQL or DataStream?

melin commented 1 week ago

Hi, does Flink cdc 3.0 requires the sink to support specific APIs for SQL or DataStream?

api: https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/developer-guide/understand-flink-cdc-api/

paimon sink: https://github.com/apache/flink-cdc/tree/master/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-paimon/src/main/java/org/apache/flink/cdc/connectors/paimon/sink

danny0405 commented 1 week ago

It looks like a bunch of flink cdc specific APIs. We will put this on roadmap if we have energy.

melin commented 1 week ago

It looks like a bunch of flink cdc specific APIs. We will put this on roadmap if we have energy.

yes,cdc data writing to a data lake is a very important scenario, multi-table or database synchronization without a good open source tool. flink cdc pipeline can easily write cdc data to hudi and support schema evolution (hudi version 0.X flinkcatalog does not support schema evolution).

danny0405 commented 1 week ago

hudi version 0.X flinkcatalog does not support schema evolution

For schema evolution, do you mean the alter table cmd?

melin commented 1 week ago

hudi version 0.X flinkcatalog does not support schema evolution

For schema evolution, do you mean the alter table cmd?

image