apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 957 forks source link

[Feature] MySQL CDC to Paimon: Length validation issue for char/varchar types with special characters #1447

Closed newsuperchao closed 1 year ago

newsuperchao commented 1 year ago

Search before asking

Motivation

I am using Flink CDC to capture changes from a MySQL database and write into a Paimon table. I'm facing an issue related to length validation for char and varchar types.

When the fields contain special characters or garbled characters, Paimon's field validation fails due to mismatched lengths. The system interprets this as a schema change and keeps waiting for an updated schema to be passed in, blocking the process indefinitely.

This can cause significant problems when dealing with real-world data that often includes special or unexpected characters.

I'm wondering if there's a way to either disable this validation, or make it optional specifically for these field types (char, varchar). This would improve the flexibility and robustness of the CDC integration with Paimon.

Thanks for considering this issue.

Solution

The system should either ignore length mismatches for char and varchar types, or provide an option to disable this validation.

Anything else?

No response

Are you willing to submit a PR?

JingsongLi commented 1 year ago

We can provide a wide schema mode, such as converting all char and varchar to String no matter what