apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.43k stars 2.42k forks source link

[SUPPORT]In the scenario of writing from mysql cdc to hudi use flink,does hudi already supported schema evolution? #9178

Open Tomccat3 opened 1 year ago

Tomccat3 commented 1 year ago

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

I'm using flink ingest data from mysql to hudi,but I'm running into a problem: mysql table schema will change,i want implement automatic schema evolution, does hudi already support this requirement, or is it already in development

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Tomccat3 commented 1 year ago

@danny0405 Hello Danny, can you help me with this problem?

ad1happy2go commented 1 year ago

@Tomccat3 Hudi should support it. Are you facing issues? What kind of schema changes you are having?

danny0405 commented 1 year ago

@Tomccat3 , schema evolution in Hudi is supported, but not automatically, you still need to alter the table schema manually and maybe restart the writing job.

voonhous commented 1 year ago

Commenting for visibility as this might be a feature-request for implicit comprehensive schema evolution for Hudi-on-Flink.

Tomccat3 commented 1 year ago

@Tomccat3 , schema evolution in Hudi is supported, but not automatically, you still need to alter the table schema manually and maybe restart the writing job.

Thanks for your reply, i don't want to restart the writing job, can schema evolution be made automatic in flink datastream job?

Tomccat3 commented 1 year ago

@Tomccat3 Hudi should support it. Are you facing issues? What kind of schema changes you are having?

For example:

  1. I have a flink datastream job consume mysql binlog and then write to hudi 。
  2. When the upstream mysql table schema has changed,i need to alter hudi table schema manually, and restart the flink job, i hope it can be made automatic。
danny0405 commented 1 year ago

be made automatic in flink datastream job

No, the open source Flink does no support pass the schema around to the sink, that is a tricky part for Hudi to track the schema changes dynamically.

Tomccat3 commented 1 year ago

No, the open source Flink does no support pass the schema around to the sink,

ok, what if we implement a RowData with schema?

danny0405 commented 1 year ago

yeah, possible, you can pass around the schema together with the RowData, before each commit, you can overwrite the schema with the latest schema, only feasible for Flink DataStream API.

Tomccat3 commented 1 year ago

ok, i will try

FranMorilloAWS commented 1 year ago

ANy example o how to do this? How can you build the Hoodie Pipeline Builder dynamically from RowData?

anandp504 commented 8 months ago

@Tomccat3 Can you please provide an example of passing the schema along with the RowData? Currently, it seems to be tightly coupled with the HoodiePipelineBuilder with a static schema.